Method for Rapidly Screening Microbial Hosts to Identify Certain Strains with Improved Yield and/or Quality in the Expression of Heterologous Proteins

ABSTRACT

The present invention provides an array for rapidly identifying a host cell population capable of producing heterologous protein with improved yield and/or quality. The array comprises one or more host cell populations that have been genetically modified to increase the expression of one or more target genes involved in protein production, decrease the expression of one or more target genes involved in protein degradation, or both. One or more of the strains in the array may express the heterologous protein of interest in a periplasm compartment, or may secrete the heterologous protein extracellularly through an outer cell wall. The strain arrays are useful for screening for improved expression of any protein of interest, including therapeutic proteins, hormones, a growth factors, extracellular receptors or ligands, proteases, kinases, blood proteins, chemokines, cytokines, antibodies and the like.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a Continuation-in-Part of U.S. application Ser. No.12/109,554, filed Apr. 25, 2008, incorporated by reference herein, whichclaims the benefit of U.S. Provisional Application No. 60/914,361, filedApr. 27, 2007, incorporated by reference herein.

REFERENCE TO A SEQUENCE LISTING SUBMITTED AS A TEXT FILE VIA EFS-WEB

The official copy of the sequence listing is submitted concurrently withthe specification as a text file via EFS-Web, in compliance with theAmerican Standard Code for Information Interchange (ASCII), with a filename of 38194701.txt, a creation date of Oct. 30, 2009 and a size of 352KB. The sequence listing filed via EFS-Web is part of the specificationand is herein incorporated by reference in its entirety.

FIELD OF THE INVENTION

This invention relates to the field of protein production, particularlyto identifying optimal host cells for large-scale production of properlyprocessed heterologous proteins.

BACKGROUND OF THE INVENTION

More than 150 recombinantly produced proteins and polypeptides have beenapproved by the U.S. Food and Drug Administration (FDA) for use asbiotechnology drugs and vaccines, with another 370 in clinical trials.Unlike small molecule therapeutics that are produced through chemicalsynthesis, proteins and polypeptides are most efficiently produced inliving cells. However, current methods of production of recombinantproteins in bacteria often produce improperly folded, aggregated orinactive proteins, and many types of proteins require secondarymodifications that are inefficiently achieved using known methods.

Numerous attempts have been developed to increase production of properlyfolded proteins in recombinant systems. For example, investigators havechanged fermentation conditions (Schein (1989) Bio/Technology,7:1141-1149), varied promoter strength, or used overexpressed chaperoneproteins (Hockney (1994) Trends Biotechnol. 12:456-463), which can helpprevent the formation of inclusion bodies.

Strategies have been developed to excrete proteins from the cell intothe supernatant. For example, U.S. Pat. No. 5,348,867; U.S. Pat. No.6,329,172; PCT Publication No. WO 96/17943; PCT Publication No. WO02/40696; and U.S. Application Publication 2003/0013150. Otherstrategies for increased expression are directed to targeting theprotein to the periplasm. Some investigations focus on non-Sec typesecretion (see for e.g. PCT Publication No. WO 03/079007; U.S.Publication No. 2003/0180937; U.S. Publication No. 2003/0064435; and,PCT Publication No. WO 00/59537). However, the majority of research hasfocused on the secretion of exogenous proteins with a Sec-type secretionsystem.

A number of secretion signals have been described for use in expressingrecombinant polypeptides or proteins. See, for example, U.S. Pat. No.5,914,254; U.S. Pat. No. 4,963,495; European Patent No. 0 177 343; U.S.Pat. No. 5,082,783; PCT Publication No. WO 89/10971; U.S. Pat. No.6,156,552; U.S. Pat. Nos. 6,495,357; 6,509,181; 6,524,827; 6,528,298;6,558,939; 6,608,018; 6,617,143; U.S. Pat. Nos. 5,595,898; 5,698,435;and 6,204,023; U.S. Pat. No. 6,258,560; PCT Publication Nos. WO01/21662, WO 02/068660 and U.S. Application Publication 2003/0044906;U.S. Pat. No. 5,641,671; and European Patent No. EP 0 121 352.

Heterologous protein production often leads to the formation ofinsoluble or improperly folded proteins, which are difficult to recoverand may be inactive. Furthermore, the presence of specific host cellproteases may degrade the protein of interest and thus reduce the finalyield. There is no single factor that will improve the production of allheterologous proteins. As a result, there is a need in the art foridentifying improved large-scale expression systems capable of secretingand properly processing recombinant polypeptides to produce transgenicproteins in properly processed form.

SUMMARY OF THE INVENTION

The present invention provides compositions and methods for rapidlyidentifying a host cell population capable of producing at least oneheterologous polypeptide according to a desired specification withimproved yield and/or quality. The compositions comprise two or morehost cell populations that have been genetically modified to increasethe expression of one or more target genes involved in proteinproduction, decrease the expression of one or more target genes involvedin protein degradation, express a heterologous gene that affects theprotein product, or a combination. The ability to express a polypeptideof interest in a variety of modified host cells provides a rapid andefficient means for determining an optimal host cell for the polypeptideof interest. The desired specification will vary depending on thepolypeptide of interest, but includes yield, quality, activity, and thelike.

It is recognized that the host cell populations may be modified toexpress many combinations of nucleic acid sequences that affect theexpression levels of endogenous sequences and/or exogenous sequencesthat facilitate the expression of the polypeptide of interest. In oneembodiment, two or more of the host cell populations has beengenetically modified to increase the expression of one or more targetgenes involved in one or more of the proper expression, processing,and/or translocation of a heterologous protein of interest. In anotherembodiment, the target gene is a protein folding modulator. In anotherembodiment, the protein folding modulator is selected from the list inTable 1.

In another embodiment, one or more of the host cell populations has beengenetically modified to decrease the expression of one or more targetgenes involved in proteolytic degradation. In another embodiment, thetarget gene is a protease. In another embodiment, the protease isselected from the list in Table 2.

In one embodiment, nucleotide sequences encoding the proteins ofinterest are operably linked to a P. fluorescens Sec system secretionsignal as described herein. One or more of the strains in the array mayexpress the heterologous protein of interest in a periplasm compartment.In certain embodiments, one or more strains may also secrete theheterologous protein extracellularly through an outer cell wall.

Host cells include eukaryotic cells, including yeast cells, insectcells, mammalian cells, plant cells, etc., and prokaryotic cells,including bacterial cells such as P. fluorescens, E. coli, and the like.

As indicated, the library of host cell populations can be rapidlyscreened to identify certain strain(s) with improved yield and/orquality of heterologously expressed protein. The strain arrays areuseful for screening for improved expression of any protein of interest,including therapeutic proteins, hormones, a growth factors,extracellular receptors or ligands, proteases, kinases, blood proteins,chemokines, cytokines, antibodies and the like.

The invention includes a method of assembling an array of expressionsystems for testing expression of at least one heterologous protein,said method comprising: placing in separate addressable locations atleast 10 nonidentical test expression systems, said at least 10nonidentical test expression systems each comprising a differentcombination of a) a Pseudomonad or E. coli host cell population, and b)at least one expression vector encoding the at least one heterologousprotein, wherein the array includes at least 5 different host cellpopulations and at least 2 different expression vectors, and furtherwherein at least 3 of said at least 5 different host cell populationsare deficient in their expression of at least one protease; and whereinat least one of the nonidentical test expression systems overexpressesthe at least one heterologous protein.

In embodiments, the at least 2 different expression vectors each encodea different heterologous protein. In other embodiments, the arrayincludes at least 5 different expression vectors, and wherein each ofsaid at least 5 different expression vectors encodes a differentheterologous protein. In embodiments, at least one expression vectorencodes 2 different heterologous proteins. In other embodiments, atleast 20 nonidentical test expression systems are placed in separateaddressable locations, and wherein the array includes at least 10different host cell populations and at least 2 different expressionvectors, and further wherein at least 5 of said at least 10 differenthost cell populations are deficient in their expression of at least oneprotease. In other embodiments at least 50 nonidentical test expressionsystems are placed in separate addressable locations, and wherein thearray includes at least 20 different host cell populations and at least3 different expression vectors, and further wherein at least 10 of saidat least 20 different host cell populations are deficient in theirexpression of at least one protease. In related embodiments, theoverexpression of the heterologous protein in the at least onenonidentical test expression system is an increase in yield, of about1.5-fold to an about 100-fold, relative to the yield in an indicatorexpression system. In other embodiments, the overexpression is a yieldof the heterologous protein in the at least one nonidentical testexpression system of about 10 mg/liter to about 2000 mg/liter. Inrelated embodiments, the increase in yield is about 1.5-fold to about2-fold, about 2-fold to about 3-fold, about 3-fold to about 4-fold,about 4-fold to about 5-fold, about 5-fold to about 6 fold, about 6-foldto about 7-fold, about 7-fold to about 8-fold, about 8-fold to about9-fold, about 9-fold to about 10-fold, about 10-fold to about 15-fold,about 15-fold to about 20-fold, about 20-fold to about 25-fold, about25-fold to about 30-fold, about 30-fold to about 35-fold, about 35-foldto about 40-fold, about 45-fold to about 50-fold, about 50-fold to about55-fold, about 55-fold to about 60-fold, about 60-fold to about 65-fold,about 65-fold to about 70-fold, about 70-fold to about 75-fold, about75-fold to about 80-fold, about 80-fold to about 85-fold, about 85-foldto about 90-fold, about 90-fold to about 95-fold, or about 95-fold toabout 100-fold. In other related embodiments, the yield of theheterologous protein is about 10 mg/liter to about 20 mg/liter, about 20mg/liter to about 50 mg/liter, about 50 mg/liter to about 100 mg/liter,about 100 mg/liter to about 200 mg/liter, about 200 mg/liter to about300 mg/liter, about 300 mg/liter to about 400 mg/liter, about 400mg/liter to about 500 mg/liter, about 500 mg/liter to about 600mg/liter, about 600 mg/liter to about 700 mg/liter, about 700 mg/literto about 800 mg/liter, about 800 mg/liter to about 900 mg/liter, about900 mg/liter to about 1000 mg/liter, about 1000 mg/liter to about 1500mg/liter, or about 1500 mg/liter to about 2000 mg/liter. Included areembodiments wherein the indicator expression system comprises a secondnonidentical test expression system in the array or a standardexpression system. In other embodiments, the yield of the heterologousprotein is a measure of the amount of soluble heterologous protein, theamount of recoverable heterologous protein, the amount of properlyprocessed heterologous protein, the amount of properly foldedheterologous protein, the amount of active heterologous protein, and/orthe total amount of heterologous protein. The invention includes methodswherein the optimal expression system is selected from among the testexpression systems based on the increased yield of the heterologousprotein in the test expression system relative to that in the indicatorexpression system. In certain embodiments, an optimal expression systemis selected from among the test expression systems based on the yield ofthe heterologous protein in the test expression system.

The invention also includes methods for selecting an optimal expressionsystem comprising using the array assembled using the methods of theinvention, and an array assembled using the methods of the invention. Inembodiments, at least 2 of said at least 5 different expression systemsoverexpress at least one folding modulator. In other embodiments, the atleast one folding modulator is selected from the folding modulatorslisted herein in Table 1 and Table 2. In embodiments, the at least onefolding modulator is expressed from a plasmid. In certain embodiments,at least one host cell population is defective in at least one to abouteight proteases. In other embodiments, the at least one to about eightproteases are selected from the proteases listed in Table 1 and Table 2.In embodiments, the methods of the invention include determining thenumber of cysteine residues in, the presence of clustered prolines in,the requirement of an N terminal methionine for activity of, or thepresence of a small amino acid in the plus two position of, theheterologous protein. In certain embodiments, when the heterologousprotein has more than two cysteine residues, at least one of said atleast 2 different expression systems overexpressing a folding modulatoroverexpresses a disulfide isomerase/oxidoreductase. In embodiments, thedisulfide isomerase/oxidoreductase is encoded on a plasmid. Inembodiments, when the heterologous protein has more than four cysteineresidues, at least one of said at least 2 different expression vectorsencoding the heterologous protein contains a periplasmic secretionleader sequence. In other embodiments, when the heterologous protein hasmore than four cysteine residues, at least one of said at least 2different expression vectors encoding the heterologous protein containsa high or medium ribosome binding sequence. In embodiments, said atleast one of said at least 2 different expression vectors encoding theheterologous protein and containing a periplasmic secretion leadersequence is included in at least one expression system thatoverexpresses at least one periplasmic chaperone, and at least oneexpression system that overexpresses at least one cytoplasmic chaperone.In other embodiments, when the heterologous protein has fewer than fourcysteine residues, at least one of said at least 2 different expressionvectors encoding the heterologous protein does not contain a periplasmicsecretion leader sequence, and further wherein said at least one of saidat least 2 different expression vectors encoding the heterologousprotein and not containing a periplasmic secretion leader sequence isincluded in at least one expression system that overexpresses at leastone cytoplasmic chaperone. In other embodiments, when clustered prolinesare present, at least one expression system that overexpresses at leastone 2+ peptidyl-prolyl cis-trans isomerase (PPIase) is included in thearray. In certain embodiments, the 2+ peptidyl-prolyl cis-transisomerase (PPIase) is encoded on a plasmid. In other embodiments, whenthe N-terminal methionine is required, at least one expression systemcomprising a host cell population that has at least one defect in atleast one methionyl amino peptidase, is included in the array. Inembodiments, when a small amino acid is present in the plus two positionof the heterologous protein, at least one expression system comprising ahost cell population that has at least one defect in at least one aminopeptidase, is included in the array.

In embodiments, the small amino acid is selected from the groupconsisting of: glycine, alanine, valine, serine, threonine, asparticacid, asparagine, and proline. In embodiments, the heterologous proteinis a toxin. In specific embodiments, the toxin is a vertebrate orinvertebrate animal toxin, a plant toxin, a bacterial toxin, a fungaltoxin, or variant thereof. In other embodiments, the heterologousprotein is a cytokine, growth factor or hormone, or receptor thereof. Incertain embodiments, the heterologous protein is an antibody or antibodyderivative. In specific embodiments, the antibody or antibody derivativeis a humanized antibody, modified antibody, nanobody, bispecificantibody, single-chain antibody, Fab, Domain antibody, shark singledomain antibody, camelid single domain antibody, linear antibody,diabody, or BiTE molecule. In other embodiments, the heterologousprotein is a human therapeutic protein or therapeutic enzyme, anon-natural protein, a fusion protein, a chaperone, a pathogen proteinor pathogen-derived antigen, a lipoprotein, a reagent protein, or abiocatalytic enzyme. In embodiments of the invention, at least 10% ofthe heterologous protein is insoluble when expressed in an indicatorstrain, or wherein the heterologous protein is predicted to be insolubleusing a protein solubility prediction tool.

The invention additionally includes a method for selecting an optimalexpression system for overexpressing at least one heterologous protein,said method comprising: assembling an array by placing in separateaddressable locations at least 10 nonidentical test expression systems,said at least 10 nonidentical test expression systems each comprising adifferent combination of a) Pseudomonad or E. coli host cell population,and b) at least one expression vector encoding the at least oneheterologous protein, wherein the array includes at least 5 differenthost cell populations and at least 2 different expression vectors, andfurther wherein at least 3 of said at least 10 different host cellpopulations are deficient in their expression of at least one protease;measuring the yield of the heterologous protein expressed; and selectingat least one optimal expression system from among the test expressionsystems based on the yield of the heterologous protein measured. Inembodiments, the yield of the heterologous protein is about 1.5-fold toan about 100-fold higher in the at least one optimal expression systemrelative to that in an indicator expression system. In otherembodiments, the yield of the heterologous protein in the at least oneoptimal expression system is about 10 mg/liter to about 2000 mg/liter.In certain embodiments, the indicator expression system comprises asecond nonidentical test expression system in the array or a standardexpression system. In embodiments, the yield of the heterologous proteinis a measure of the amount of soluble heterologous protein, the amountof recoverable heterologous protein, the amount of properly processedheterologous protein, the amount of properly folded heterologousprotein, the amount of active heterologous protein, and/or the totalamount of heterologous protein.

The invention also includes an array of expression systems for testingexpression of at least one heterologous protein, said array comprising:at least 10 nonidentical test expression systems in separate addressablelocations, said at least 10 nonidentical test expression systems eachcomprising a different combination of a) a Pseudomonad or E. coli hostcell population, and b) at least one expression vector encoding at leastone heterologous protein, wherein the array includes at least 5different host cell populations and at least 2 different expressionvectors, and further wherein at least 3 of said at least 5 differenthost cell populations are deficient in their expression of at least oneprotease; and wherein at least one of the nonidentical test expressionsystems overexpresses the heterologous protein.

BRIEF DESCRIPTION OF THE FIGURES

FIG. 1A depicts plasmid pDOW1261-2 used for engineering genomic deletionin P. fluorescens. FIG. 1B is a schematic drawing of the constructionsof a gene X deletion.

FIG. 2 is a Western blot analysis of soluble cells fractions prepared at0 and 24 hours post-induction (I0 and I24, respectively) in Δpra1,ΔdegP2, ΔLa2 and the grpEdnaKJ co-expression strains (Example 6). Thetop arrows point to the fully assembled monoclonal antibody in theco-expressed strains but not in the control (DC440). r=recombinant;n-r=nonrecombinant.

FIG. 3 shows growth curves for P. fluorescens (filled symbols) and E.coli (open symbols) expression clones. Elapsed fermentation time inhours is shown on the X-axis and optical density measured at 600 nm(A₆₀₀) is shown on the Y-axis. The arrow indicates time of induction ofP. fluorescens cultures.

FIG. 4 shows SDS-PAGE (A) and Western blot (B) analysis of analyses ofsoluble (S) and insoluble (I) samples following 24 hours induction of E.coli (E) or P. fluorescens (P) cultures. Molecular weight markers areindicated in the first lane. Proteins expressed are indicated at thebottom of the gel image.

FIG. 5 provides the sequences of the elements of SEQ ID NOS: 138-157,including open reading frames for folding modulators and proteases.

FIG. 6 provides sequences, including leader sequences, also providedherein in the sequence listing. These sequences are disclosed in U.S.Patent App. Pub. No. 2008/0193974, “Bacterial leader sequences forincreased expression.”

FIG. 7 provides sequences, including leader sequences, also providedherein in the sequence listing as SEQ ID NOS:208-327. These sequencesare disclosed in U.S. Patent App. Pub. No. US2006/0008877, “Expressionsystems with Sec-secretion.”

DETAILED DESCRIPTION

The present inventions now will be described more fully hereinafter withreference to the accompanying drawings, in which some, but not allembodiments of the invention are shown. Indeed, these inventions may beembodied in many different forms and should not be construed as limitedto the embodiments set forth herein; rather, these embodiments areprovided so that this disclosure will satisfy applicable legalrequirements.

Many modifications and other embodiments of the inventions set forthherein will come to mind to one skilled in the art to which theseinventions pertain having the benefit of the teachings presented in theforegoing descriptions and the associated drawings.

Therefore, it is to be understood that the inventions are not to belimited to the specific embodiments disclosed and that modifications andother embodiments are intended to be included within the scope of theinvention. Although specific terms are employed herein, they are used ina generic and descriptive sense only and not for purposes of limitation.

Overview

Compositions and methods for identifying an optimal host strain, e.g, aPseudomonas fluorescens host strain, for producing high levels ofproperly processed heterologous polypeptides in a host cell areprovided. In particular, a library (or “array”) of host strains isprovided, wherein each strain (or “population of host cells”) in thelibrary has been genetically modified to modulate the expression of oneor more target genes in the host cell. An “optimal host strain” or“optimal expression system” can be identified or selected based on thequantity, quality, and/or location of the expressed protein of interestcompared to other populations of phenotypically distinct host cells inthe array. Thus, an optimal host strain is the strain that produces thepolypeptide of interest according to a desired specification. While thedesired specification will vary depending on the polypeptide beingproduced, the specification includes the quality and/or quantity ofprotein, whether the protein is sequestered or secreted, proteinfolding, and the like. For example, the optimal host strain or optimalexpression system produces a yield, characterized by the amount ofsoluble heterologous protein, the amount of recoverable heterologousprotein, the amount of properly processed heterologous protein, theamount of properly folded heterologous protein, the amount of activeheterologous protein, and/or the total amount of heterologous protein,of a certain absolute level or a certain level relative to that producedby an indicator strain, i.e., a strain used for comparison.

“Heterologous,” “heterologously expressed,” or “recombinant” generallyrefers to a gene or protein that is not endogenous to the host cell oris not endogenous to the location in the native genome in which it ispresent, and has been added to the cell by infection, transfection,microinjection, electroporation, microprojection, or the like.

One or more of the host cell populations in the array is modified tomodulate the expression of one or more target genes in the host cell. By“target gene” is intended a gene that affects heterologous proteinproduction in a host cell. Target genes that affect heterologous proteinproduction include genes encoding proteins that modulate expression,activity, solubility, translocation, proteolytic degradation and/orcleavage of the heterologous protein. For example, a target gene mayencode at least one of a host cell protease, a protein foldingmodulator, a transcription factor, a translation factor, a secretionmodulator, or any other protein involved in the proper transcription,translation, processing, and/or translocation of a heterologous proteinof interest. A “target protein” refers to the protein or polypeptideresulting from expression of the target gene. Expression and/or activityof a target gene or genes is increased or decreased, depending on thefunction of the target gene or protein. For example, expression of oneor more host cell proteases may be decreased, whereas expression of oneor more protein folding modulators may be increased.

The arrays described herein are useful for rapidly identifying anoptimal host cell for production of a heterologous protein or peptide ofinterest. Heterologous protein production often leads to the formationof insoluble or improperly folded proteins, which are difficult torecover and may be inactive. Furthermore, the presence of specific hostcell proteases may degrade the protein of interest and thus reduce thefinal yield. There is no single host cell population that will optimallyproduce all polypeptides or proteins of interest. Thus, using thecompositions and methods of the invention, an optimal host cell can berapidly and efficiently identified from the library of modified cellpopulations. The optimal host strain can then be used to producesufficient amounts of the protein of interest or for commercialproduction. Likewise, a host strain can be modified for expression ofthe protein of interest based on the optimal host strain.

In one embodiment, the method includes obtaining an array comprising atleast a first and a second population of P. fluorescens cells, whereineach population is selected from the group consisting of (i) apopulation of P. fluorescens cells that has been genetically modified toreduce the expression of at least one target gene involved in proteindegradation; (ii) a population of P. fluorescens cells that has beengenetically modified to increase the expression of at least one targetgene involved in protein production; and, (iii) a population of P.fluorescens cells that has been genetically modified to reduce theexpression of at least one target gene involved in protein degradationand to increase the expression of at least target gene involved inprotein production; introducing into at least one cell of eachpopulation an expression construct comprising at least one gene encodingat least one heterologous protein of interest; maintaining said cellsunder conditions sufficient for the expression of said protein ofinterest in at least one population of cells; and selecting the optimalpopulation of cells in which the heterologous protein of interest isproduced; wherein each population in the array is non-identical andwherein each population is physically separate one from another; whereinthe heterologous protein of interest exhibits one or more of improvedexpression, improved activity, improved solubility, improvedtranslocation, or reduced proteolytic degradation or cleavage in theoptimal population of cells compared to other populations in the array.

The array may further comprise a population of host cells (e.g., P.fluorescens host cells) that has not been genetically modified to alterthe expression of a host cell protease or a protein folding modulator.This population may be a wild-type strain, or may be a strain that hasbeen genetically modified to alter the expression of or or more genesnot involved in protein production, processing, or translocation (e.g.,may be genetically modified to express, for example, a selectable markergene).

In one embodiment, each population of P. fluorescens host cells isphenotypically distinct (i.e., “non-identical”) one from another. By“phenotypically distinct” is intended that each population produces ameasurably different amount of one or more target proteins. In thisembodiment, each strain has been genetically modified to alter theexpression of one or more different target genes. Where the expressionof more than one target gene is modulated in a population of host cells,then the combination of target genes is phenotypically distinct fromother populations in the library. An array comprising a plurality ofphenotypically distinct populations of host cells according to thepresent invention is one that provides a diverse population from whichto select one or more strains useful for producing a heterologousprotein or peptide of interest. It will be understood by one of skill inthe art that such an array may also comprise replicates (e.g.,duplicates, triplicates, etc.) of any one or more populations of hostcells.

In embodiments, structural characteristics of the recombinant proteinguide the selection of expression vector elements. The expression vectorelements can in turn influence selection of the host cell population.For example, a recombinant protein having multiple cysteine residues canhave a propensity to misfold improper due to disulfide mispairing. Usingthe methods of the present invention, an array that includes at leastone expression vector having a periplasmic secretion leader isassembled, and in turn that expression vector is paired with a host cellpopulation that overexpresses a periplasmic chaperone. The host strainelement thus can act synergistically with the vector element to increaseexpression of the recombinant protein. Thus, in embodiments, an array ofthe present invention is assembled using different combinations ofpotentially synergistic expression vector and host cell elements.

In embodiments, a heterologous protein containing more than onedisulfide bond, or more than two cysteine residues, can be screened inexpression systems wherein the host strain is, e.g., a disulfideisomerase/oxidoreductase pathway overexpressor. In addition to thenumber of cysteine residues available to form disulfide bonds, aheterologous protein can be evaluated to determine the presence ofclustered prolines, the requirement of an N terminal methionine foractivity, or the presence of a small amino acid in the plus twoposition. In embodiments, identification of the presence of clusteredprolines or several prolines within relatively close proximity indicatesthe use of a 2+ peptidyl-prolyl cis-trans isomerase (PPIase)overexpression host cell population. In other embodiments, a host cellpopulation that has at least one defect in at least one methionyl aminopeptidase is included in the array when the heterologous protein isdetermined to require an N-terminal methionine. In still otherembodiments, a host cell population that has at least one defect in atleast one amino peptidase is included in an expression system of thearray when the presence of a small amino acid in the plus two positionof the heterologous protein is identified.

The heterologous protein can also be evaluated for a propensity forprotease degradation, and a host cell populations having one or moreprotease mutations used in the array. Furthermore, if a cleavage sitefor a specific protease is identified, a host having a mutation in theprotease(s) which cleaves at that site can be included in the array.Useful host cell populations can contain multiple protease mutations,multiple folding modulators, or both protease mutations and foldingmodulators. In embodiments, a host cell population that has at least oneto at least eight different protease mutations is used in an expressionsystem of the array.

Variation of the expression systems of the invention at multipleinterdependent levels allows fine-tuning of expression, which inconjunction with rapid screening capabilities provides a powerful toolfor identifying overexpression systems for any protein.

Arrays

Provided herein is an array of host cell populations (i.e. “strainarray”) which can be rapidly screened to identify certain strain(s) withimproved yield and/or quality of heterologous protein. As used herein,the term “strain array” refers to a plurality of addressed oraddressable locations (e.g., wells, such as deep well or microwells).The location of each of the microwells or groups of microwells in thearray is typically known, so as to allow for identification of theoptimal host cell for expression of the heterologous protein ofinterest.

The strain array comprises a plurality of phenotypically distinct hoststrains. The arrays may be low-density arrays or high-density arrays andmay contain about 2 or more, about 4 or more, about 8 or more, about 12or more, about 16 or more, about 20 or more, about 24 or more, about 32or more, about 40 or more, about 48 or more, about 64 or more, about 72or more, about 80 or more, about 96 or more, about 192 or more, about384 or more host cell populations.

The host cell populations of the invention can be maintained and/orscreened in a multi-well or deep well vessel. The vessel may contain anydesired number of wells, however, a miniaturized cell culture microarrayplatform is useful for screening each population of host cellsindividually and simultaneously using minimal reagents and a relativelysmall number of cells. A typical multi-well, microtiter vessel useful inthis assay is a multi-well plate including, without limitation, 10-wellplates, 28-well plates, 96-well plates, 384-well plates, and plateshaving greater than 384 wells. Alternatively, an array of tubes,holders, cartridges, minitubes, microfuge tubes, cryovials, square wellplates tubes, plates, slants, or culture flasks may also be used,depending on the volume desired.

The vessel may be made of any material suitable for culturing and/orscreening a host cell of interest, e.g., Pseudomonas. For example, thevessel can be a material that can be easily sterilized such as plasticor other artificial polymer material, so long as the material isbiocompatible. Any number of materials can be used, including, but notlimited to, polystyrene; polypropylene; polyvinyl compounds (e.g.polyvinylchloride); polycarbonate (PVC); polytetrafluoroethylene (PTFE);polyglycolic acid (PGA); cellulose; glass, fluoropolymers, fluorinatedethylene propylene, polyvinylidene, polydimethylsiloxane, silicon, andthe like.

Automated transformation of cells and automated colony pickers willfacilitate rapid screening of desired cells. The arrays may be createdand/or screened using a spotter device (e.g., automated robotic devices)as known in the art.

Target Genes

The strain array of the present invention comprises a plurality ofphenotypically and genotypically distinct host cell populations, whereineach population in the array has been genetically modified to modulatethe expression of one or more target genes in the host cell. By “targetgene” is intended a gene that affects heterologous protein production ina host cell. A target gene may encode a host cell protease or anendogenous or exogenous protein folding modulator, transcription factor,translation factor, secretion modulator, or any other gene involved inthe proper expression, processing, and/or translocation of aheterologous protein of interest. A “target protein” refers to theprotein or polypeptide resulting from expression of the target gene.Expression and/or activity of a target gene or genes is increased ordecreased, depending on the function of the target gene or protein. Atarget gene can be endogenous to the host cell, or can be a gene that isheterologously expressed in each of the host cell populations in thearray.

In one embodiment, the target gene or genes is at least one proteinfolding modulator, putative protein folding modulator, or a cofactor orsubunit of a folding modulator. In some embodiments, the target gene orgenes can be selected from a chaperone protein, a foldase, a peptidylprolyl isomerase and a disulfide bond isomerase. In some embodiments,the target gene or genes can be selected from htpG, cbpA, dnaJ, dnaK andfkbP. Exemplary protein folding modulators from P. fluorescens arelisted in Table 1.

In other embodiments, the target gene comprises at least one putativeprotease, a protease-like protein, or a cofactor or subunit of aprotease. For example, the target gene or genes can be a serine,threonine, cysteine, aspartic or metallopeptidase. In one embodiment,the target gene or genes can be selected from hslV, hslU, clpA, clpB andclpX. The target gene can also be a cofactor of a protease. Exemplaryproteases from P. fluorescens are listed in Table 2. Proteases from avariety of organisms can be found in the MEROPS Peptidase Databasemaintained by the Wellcome Trust Sanger Institute, Cambridge, UK (seethe website address merops.sanger.ac.uk/).

Protein Folding Modulators

Another major obstacle in the production of heterologous proteins inhost cells is that the cell often is not adequately equipped to produceeither soluble or active protein. While the primary structure of aprotein is defined by its amino acid sequence, the secondary structureis defined by the presence of alpha helices or beta sheets, and theternary structure by covalent bonds between adjacent protein stretches,such as disulfide bonds. When expressing heterologous proteins,particularly in large-scale production, the secondary and tertiarystructure of the protein itself is of critical importance. Anysignificant change in protein structure can yield a functionallyinactive molecule, or a protein with significantly reduced biologicalactivity. In many cases, a host cell expresses protein foldingmodulators (PFMs) that are necessary for proper production of activeheterologous protein. However, at the high levels of expressiongenerally required to produce usable, economically satisfactorybiotechnology products, a cell often cannot produce enough nativeprotein folding modulator or modulators to process theheterologously-expressed protein.

In certain expression systems, overproduction of heterologous proteinscan be accompanied by their misfolding and segregation into insolubleaggregates. In bacterial cells these aggregates are known as inclusionbodies. In E. coli, the network of folding modulators/chaperonesincludes the Hsp70 family. The major Hsp70 chaperone, DnaK, efficientlyprevents protein aggregation and supports the refolding of damagedproteins.

The incorporation of heat shock proteins into protein aggregates canfacilitate disaggregation. However, proteins processed to inclusionbodies can, in certain cases, be recovered through additional processingof the insoluble fraction. Proteins found in inclusion bodies typicallyhave to be purified through multiple steps, including denaturation andrenaturation. Typical renaturation processes for inclusion body targetedproteins involve attempts to dissolve the aggregate in concentrateddenaturant and subsequent removal of the denaturant by dilution.Aggregates are frequently formed again in this stage. The additionalprocessing adds cost, there is no guarantee that the in vitro refoldingwill yield biologically active product, and the recovered proteins caninclude large amounts of fragment impurities.

The recent realization that in vivo protein folding is assisted bymolecular chaperones, which promote the proper isomerization andcellular targeting of other polypeptides by transiently interacting withfolding intermediates, and by foldases, which accelerate rate-limitingsteps along the folding pathway, has provided additional approaches tocombat the problem of inclusion body formation (see for e.g. Thomas J Get al. (1997) Appl Biochem Biotechnol 66:197-238).

In certain cases, the overexpression of chaperones has been found toincrease the soluble yields of aggregation-prone proteins (see Baneyx,F. (1999) Curr. Opin. Biotech. 10:411-421 and references therein). Thebeneficial effect associated with an increase in the intracellularconcentration of these chaperones appears highly dependent on the natureof the overproduced protein, and may not require overexpression of thesame protein folding modulator(s) for all heterologous proteins.

Protein folding modulators, including chaperones, disulfide bondisomerases, and peptidyl-prolyl cis-trans isomerases (PPlases) are aclass of proteins present in all cells which aid in the folding,unfolding and degradation of nascent polypeptides.

Chaperones act by binding to nascent polypeptides, stabilizing them andallowing them to fold properly. Proteins possess both hydrophobic andhydrophilic residues, the former are usually exposed on the surfacewhile the latter are buried within the structure where they interactwith other hydrophilic residues rather than the water which surroundsthe molecule. However in folding polypeptide chains, the hydrophilicresidues are often exposed for some period of time as the protein existsin a partially folded or misfolded state. It is during this time whenthe forming polypeptides can become permanently misfolded or interactwith other misfolded proteins and form large aggregates or inclusionbodies within the cell. Chaperones generally act by binding to thehydrophobic regions of the partially folded chains and preventing themfrom misfolding completely or aggregating with other proteins.Chaperones can even bind to proteins in inclusion bodies and allow themto disaggregate (Ranson et. al. 1998). The GroES/EL, DnaKJ, Clp, Hsp90and SecB families of folding modulators are all examples of proteinswith chaperone like activity.

Another important type of folding modulator is the disulfide bondisomerases. These proteins catalyze a very specific set of reactions tohelp folding polypeptides form the proper intra-protein disulfide bonds.Any protein that has more than two cysteines is at risk of formingdisulfide bonds between the wrong residues. The disulfide bond formationfamily consists of the Dsb proteins which catalyze the formation ofdisulfide bonds in the non-reducing environment of the periplasm. When aperiplasmic polypeptide misfolds disulfide bond isomerase, DsbC iscapable of rearranging the disulfide bonds and allowing the protein toreform with the correct linkages.

The proline residue is unique among amino acids in that the peptidylbond immediately preceding it can adopt either a cis or transconformation. For all other amino acids this is not favored due tosteric hindrance. Peptidyl-prolyl cis-trans isomerases (PPlases)catalyze the conversion of this bond from one form to the other. Thisisomerization may aid in protein folding, refolding, assembly ofsubunits and trafficking in the cell (Dolinski, et. al. 1997).

In addition to the general chaperones which seem to interact withproteins in a non-specific manner, there are also chaperones which aidin the folding of specific targets. These protein-specific chaperonesform complexes with their targets, preventing aggregation anddegradation and allowing time for them to assemble into multi-subunitstructures. The PapD chaperone is one well known example of this type(Lombardo et. al. 1997).

Folding modulators also include, for example, HSP70 proteins, HSP110/SSEproteins, HSP40 (DNAJ-related) proteins, GRPE-like proteins, HSP90proteins, CPN60 and CPN10 proteins, Cytosolic chaperoning, HSP100proteins, Small HSPs, Calnexin and calreticulin, PDI andthioredoxin-related proteins, Peptidyl-prolyl isomerases, CyclophilinPPlases, FK-506 binding proteins, Parvulin PPlases, Individualchaperoning, Protein specific chaperones, or intramolecular chaperones.Folding modulators are generally described in “Guidebook to MolecularChaperones and Protein-Folding Catalysts” (1997) ed. M. Gething,Melbourne University, Australia.

The best characterized molecular chaperones in the cytoplasm of E. coliare the ATP-dependent DnaK-DnaJ-GrpE and GroEL-GroES systems. Based onin vitro studies and homology considerations, a number of additionalcytoplasmic proteins have been proposed to function as molecularchaperones in E. coli. These include ClpB, HtpG and IbpA/B, which, likeDnaK-DnaJ-GrpE and GroEL-GroES, are heat-shock proteins (Hsps) belongingto the stress regulon. The trans conformation of X-Pro bonds isenergetically favored in nascent protein chains; however, approximately5% of all prolyl peptide bonds are found in a cis conformation in nativeproteins. The trans to cis isomerization of X-Pro bonds is rate limitingin the folding of many polypeptides and is catalyzed in vivo by peptidylprolyl cis/trans isomerases (PPlases). Three cytoplasmic PPlases, SlyD,SlpA and trigger factor (TF), have been identified to date in E. coli.TF, a 48 kDa protein associated with 505 ribosomal subunits that hasbeen postulated to cooperate with chaperones in E. coli to guaranteeproper folding of newly synthesized proteins. At least five proteins(thioredoxins 1 and 2, and glutaredoxins 1, 2 and 3, the products of thetrxA, trxC, grxA, grxB and grxC genes, respectively) are involved in thereduction of disulfide bridges that transiently arise in cytoplasmicenzymes. Thus, target genes can be disulfide bond forming proteins orchaperones that allow proper disulfide bond formation.

TABLE 1 P. fluorescens strain MB214 protein folding modulators ORF IDGENE FUNCTION FAMILY LOCATION GroES/EL RXF02095.1 groES Chaperone Hsp10Cytoplasmic RXF06767.1:: groEL Chaperone Hsp60 Cytoplasmic Rxf02090RXF01748.1 ibpA Small heat-shock protein (sHSP) IbpA Hsp20 CytoplasmicPA3126; Acts as a holder for GroESL folding RXF03385.1 hscB Chaperoneprotein hscB Hsp20 Cytoplasmic Hsp70 (DnaK/J) RXF05399.1 dnaK ChaperoneHsp70 Periplasmic RXF06954.1 dnaK Chaperone Hsp70 Cytoplasmic RXF03376.1hscA Chaperone Hsp70 Cytoplasmic RXF03987.2 cbpA Curved dna-bindingprotein, dnaJ like activity Hsp40 Cytoplasmic RXF05406.2 dnaJ Chaperoneprotein dnaJ Hsp40 Cytoplasmic RXF03346.2 dnaJ Molecular chaperones(DnaJ family) Hsp40 Non-secretory RXF05413.1 grpE heat shock proteinGrpE PA4762 GrpE Cytoplasmic Hsp100 (Clp/Hsl) RXF04587.1 clpAatp-dependent clp protease atp-binding subunit Hsp100 Cytoplasmic clpARXF08347.1 clpB ClpB protein Hsp100 Cytoplasmic RXF04654.2 clpXatp-dependent clp protease atp-binding subunit Hsp100 Cytoplasmic clpXRXF04663.1 clpP atp-dependent Clp protease proteolytic subunit MEROPSCytoplasmic (ec 3.4.21.92) peptidase family S14 RXF01957.2 hslUatp-dependent hsl protease atp-binding subunit Hsp100 Cytoplasmic hslURXF01961.2 hslV atp-dependent hsl protease proteolytic subunit MEROPSCytoplasmic peptidase subfamily T1B Hsp33 RXF04254.2 yrfI 33 kDachaperonin (Heat shock protein 33 Hsp33 Cytoplasmic homolog) (HSP33).Hsp90 RXF05455.2 htpG Chaperone protein htpG Hsp90 Cytoplasmic SecBRXF02231.1 secB secretion specific chaperone SecB SecB Non-secretoryDisulfide Bond Isomerases RXF07017.2 dsbA disulfide isomerase DSBAoxido- Cytoplasmic reductase RXF08657.2 dsbA/ disulfide isomerase DSBAoxido- Cytoplasmic dsbC/ reductase dsbG/ fernA RXF01002.1 dsbA/disulfide isomerase DSBA oxido- Periplasmic dsbC reductase/ ThioredoxinRXF03307.1 dsbC disulfide isomerase Glutaredoxin/ PeriplasmicThioredoxin RXF04890.2 dsbG disulfide isomerase Glutaredoxin/Periplasmic Thioredoxin RXF03204.1 dsbB Disulfide bond formation proteinB (Disulfide DSBA oxido- Periplasmic oxidoreductase). reductaseRXF04886.2 dsbD Thiol:disulfide interchange protein dsbD DSBA oxido-Periplasmic reductase Peptidyl-prolyl cis-trans isomerases RXF03768.1ppiA Peptidyl-prolyl cis-trans isomerase A (ec 5.2.1.8) PPIase:Periplasmic cyclophilin type RXF05345.2 ppiB Peptidyl-prolyl cis-transisomerase B. PPIase: Cytoplasmic cyclophilin type RXF06034.2 fklBPeptidyl-prolyl cis-trans isomerase FklB. PPIase: OuterMembrane FKBPtype RXF06591.1 fklB/ fk506 binding protein Peptidyl-prolyl cis-transPPIase: Periplasmic fkbP isomerase (EC 5.2.1.8) FKBP type RXF05753.2fklB; Peptidyl-prolyl cis-trans isomerase (ec 5.2.1.8) PPIase:OuterMembrane fkbP FKBP type RXF01833.2 slyD Peptidyl-prolyl cis-transisomerase SlyD. PPIase: Non-secretory FKBP type RXF04655.2 tig Triggerfactor, ppiase (ec 5.2.1.8) PPIase: Cytoplasmic FKBP type RXF05385.1yaad Probable FKBP-type 16 kDa peptidyl-prolyl cis- PPIase:Non-secretory trans isomerase (EC 5.2.1.8) (PPiase) FKBP type(Rotamase). RXF00271.1 Peptidyl-prolyl cis-trans isomerase (ec 5.2.1.8)PPIase: Non-secretory FKBP type pili assembly chaperones (papD like)RXF06068.1 cup Chaperone protein cup pili assembly Periplasmic papDRXF05719.1 ecpD Chaperone protein ecpD pili assembly Signal peptide papDRXF05319.1 ecpD Hnr protein pili assembly Periplasmic chaperoneRXF03406.2 ecpD; Chaperone protein ecpD pili assembly Signal peptidecsuC papD RXF04296.1 ecpD; Chaperone protein ecpD pili assemblyPeriplasmic cup papD RXF04553.1 ecpD; Chaperone protein ecpD piliassembly Periplasmic cup papD RXF04554.2 ecpD; Chaperone protein ecpDpili assembly Periplasmic cup papD RXF05310.2 ecpD; Chaperone proteinecpD pili assembly Periplasmic cup papD RXF05304.1 ecpD; Chaperoneprotein ecpD pili assembly Periplasmic cup papD RXF05073.1 gltFGram-negative pili assembly chaperone pili assembly Signal peptideperiplasmic function papD Type II Secretion Complex RXF05445.1 YacJHistidinol-phosphate aminotransferase (ec Class-II pyridoxal- Membrane2.6.1.9) phosphate-dependent aminotransferase family. Histidinol-phosphate amino- transferase subfamily. RXF05426.1 SecD Proteintranslocase subunit secd Type II Membrane secretion complex RXF05432.1SecF protein translocase subunit secf Type II Membrane secretion complexDisulfide Bond Reductases RXF08122.2 trxC Thioredoxin 2 DisulfideCytoplasmic Bond Reductase RXF06751.1 Gor Glutathione reductase (EC1.8.1.7) (GR) (GRase) Disulfide Cytoplasmic PA2025 Bond ReductaseRXF00922.1 gshA Glutamate--cysteine ligase (ec 6.3.2.2) PA5203 DisulfideCytoplasmic Bond Reductase

Protease

Unwanted degradation of heterologously-expressed protein presents anobstacle to the efficient use of certain expression systems. When a cellis modified to produce large quantities of a target protein, the cell isplaced under stress and often reacts by inducing or suppressing otherproteins. The stress that a host cell undergoes during production ofheterologous proteins can increase expression of, for example, specificproteins or cofactors to cause degradation of the overexpressedheterologous protein. The increased expression of compensatory proteinscan be counterproductive to the goal of expressing high levels ofactive, full-length heterologous protein. Decreased expression or lackof adequate expression of other proteins can cause misfolding andaggregation of the heterologously-expressed protein. While it is knownthat a cell under stress will change its profile of protein expression,not all heterologously expressed proteins will modulate expression ofthe same proteins in a particular host cell.

Thus, the optimal host strain, e.g., P. fluorescens host strain, can beidentified using an array comprising a plurality of host cellpopulations that have been genetically engineered to decrease theexpression of one or more protease enzymes. In one embodiment, one ormore host cell populations is modified by reducing the expression of,inhibiting or removing at least one protease from the genome. Themodification can also be to more than one protease. In a relatedembodiment, the cell is modified by reducing the expression of aprotease cofactor or protease protein. In another embodiment, the hostcell is modified by inhibition of a promoter for a protease or relatedprotein, which can be a native promoter. Alternatively, the genemodification can be to modulate a protein homologous to the target gene.

The array comprising the modified host strains can be screened byexpressing the heterologous protein(s) of interest and assessing thequality and/or quantity of protein production as discussed infra.Alternatively, an isolate of the heterologous protein of interest can beindependently incubated with lysate collected from each of theprotease-deficient host cell populations and the level of proteolyticdegradation can be used to identify the optimal host cell. In thisembodiment, the optimal host cell population is that which results inthe least amount of heterologous protein degradation. Thus, in oneembodiment, lysate from the optimal host cell population can be degradedby less than about 50% of the heterologous protein, less than about 45%,less than about 40%, less than about 35%, less than about 30%, less thanabout 25%, less than about 20%, less than about 10%, less than about 5%,less than about 4%, about 3%, about 2%, about 1%, or less of theprotein.

Exemplary target protease genes include those proteases classified asAminopeptidases; Dipeptidases; Dipeptidyl-peptidases and tripeptidylpeptidases; Peptidyl-dipeptidases; Serine-type carboxypeptidases;Metallocarboxypeptidases; Cysteine-type carboxypeptidases;Omegapeptidases; Serine proteinases; Cysteine proteinases; Asparticproteinases; Metallo proteinases; or Proteinases of unknown mechanism.

Aminopeptidases include cytosol aminopeptidase (leucyl aminopeptidase),membrane alanyl aminopeptidase, cystinyl aminopeptidase, tripeptideaminopeptidase, prolyl aminopeptidase, arginyl aminopeptidase, glutamylaminopeptidase, x-pro aminopeptidase, bacterial leucyl aminopeptidase,thermophilic aminopeptidase, clostridial aminopeptidase, cytosol alanylaminopeptidase, lysyl aminopeptidase, x-trp aminopeptidase, tryptophanylaminopeptidase, methionyl aminopeptidas, d-stereospecificaminopeptidase, aminopeptidase ey. Dipeptidases include x-hisdipeptidase, x-arg dipeptidase, x-methyl-his dipeptidase, cys-glydipeptidase, glu-glu dipeptidase, pro-x dipeptidase, x-pro dipeptidase,met-x dipeptidase, non-stereospecific dipeptidase, cytosol non-specificdipeptidase, membrane dipeptidase, beta-ala-his dipeptidase.Dipeptidyl-peptidases and tripeptidyl peptidases includedipeptidyl-peptidase i, dipeptidyl-peptidase ii, dipeptidyl peptidaseiii, dipeptidyl-peptidase iv, dipeptidyl-dipeptidase,tripeptidyl-peptidase I, tripeptidyl-peptidase II. Peptidyl-dipeptidasesinclude peptidyl-dipeptidase a and peptidyl-dipeptidase b. Serine-typecarboxypeptidases include lysosomal pro-x carboxypeptidase, serine-typeD-ala-D-ala carboxypeptidase, carboxypeptidase C, carboxypeptidase D.Metallocarboxypeptidases include carboxypeptidase a, carboxypeptidase B,lysine(arginine) carboxypeptidase, gly-X carboxypeptidase, alaninecarboxypeptidase, muramoylpentapeptide carboxypeptidase,carboxypeptidase h, glutamate carboxypeptidase, carboxypeptidase M,muramoyltetrapeptide carboxypeptidase, zinc d-ala-d-alacarboxypeptidase, carboxypeptidase A2, membrane pro-x carboxypeptidase,tubulinyl-tyr carboxypeptidase, carboxypeptidase t. Omegapeptidasesinclude acylaminoacyl-peptidase, peptidyl-glycinamidase,pyroglutamyl-peptidase I, beta-aspartyl-peptidase,pyroglutamyl-peptidase II, n-formylmethionyl-peptidase,pteroylpoly-[gamma]-glutamate carboxypeptidase, gamma-glu-Xcarboxypeptidase, acylmuramoyl-ala peptidase. Serine proteinases includechymotrypsin, chymotrypsin c, metridin, trypsin, thrombin, coagulationfactor Xa, plasmin, enteropeptidase, acrosin, alpha-lytic protease,glutamyl, endopeptidase, cathepsin G, coagulation factor viia,coagulation factor ixa, cucumisi, prolyl oligopeptidase, coagulationfactor xia, brachyurin, plasma kallikrein, tissue kallikrein, pancreaticelastase, leukocyte elastase, coagulation factor xiia, chymase,complement component c1r55, complement component c1s55,classical-complement pathway c3/c5 convertase, complement factor I,complement factor D, alternative-complement pathway c3/c5 convertase,cerevisin, hypodermin C, lysyl endopeptidase, endopeptidase 1a,gamma-reni, venombin ab, leucyl endopeptidase, tryptase, scutelarin,kexin, subtilisin, oryzin, endopeptidase k, thermomycolin, thermitase,endopeptidase SO, T-plasminogen activator, protein C, pancreaticendopeptidase E, pancreatic elastase ii, IGA-specific serineendopeptidase, U-plasminogen, activator, venombin A, furin,myeloblastin, semenogelase, granzyme A or cytotoxic T-lymphocyteproteinase 1, granzyme B or cytotoxic T-lymphocyte proteinase 2,streptogrisin A, treptogrisin B, glutamyl endopeptidase II,oligopeptidase B, limulus clotting factor c, limulus clotting factor,limulus clotting enzyme, omptin, repressor lexa, bacterial leaderpeptidase I, togavirin, flavirin. Cysteine proteinases include cathepsinB, papain, ficin, chymopapain, asclepain, clostripain, streptopain,actinide, cathepsin 1, cathepsin H, calpain, cathepsin t, glycyl,endopeptidase, cancer procoagulant, cathepsin S, picornain 3C, picornain2A, caricain, ananain, stem bromelain, fruit bromelain, legumain,histolysain, interleukin 1-beta converting enzyme. Aspartic proteinasesinclude pepsin A, pepsin B, gastricsin, chymosin, cathepsin D,neopenthesin, renin, retropepsin, pro-opiomelanocortin convertingenzyme, aspergillopepsin I, aspergillopepsin II, penicillopepsin,rhizopuspepsin, endothiapepsin, mucoropepsin, candidapepsin,saccharopepsin, rhodotorulapepsin, physaropepsin, acrocylindropepsin,polyporopepsin, pycnoporopepsin, scytalidopepsin a, scytalidopepsin b,xanthomonapepsin, cathepsin e, barrierpepsin, bacterial leader peptidaseI, pseudomonapepsin, plasmepsin. Metallo proteinases include atrolysina, microbial collagenase, leucolysin, interstitial collagenase,neprilysin, envelysin, iga-specific metalloendopeptidase, procollagenN-endopeptidase, thimet oligopeptidase, neurolysin, stromelysin 1,meprin A, procollagen C-endopeptidase, peptidyl-lysmetalloendopeptidase, astacin, stromelysin, 2, matrilysin gelatinase,aeromonolysin, pseudolysin, thermolysin, bacillolysin, aureolysin,coccolysin, mycolysin, beta-lytic metalloendopeptidase, peptidyl-aspmetalloendopeptidase, neutrophil collagenase, gelatinase B,leishmanolysin, saccharolysin, autolysin, deuterolysin, serralysin,atrolysin B, atrolysin C, atroxase, atrolysin E, atrolysin F,adamalysin, horrilysin, ruberlysin, bothropasin, bothrolysin,ophiolysin, trimerelysin I, trimerelysin II, mucrolysin, pitrilysin,insulysin, O-syaloglycoprotein endopeptidase, russellysin,mitochondrial, intermediate, peptidase, dactylysin, nardilysin,magnolysin, meprin B, mitochondrial processing peptidase, macrophageelastase, choriolysin, toxilysin. Proteinases of unknown mechanisminclude thermopsin and multicatalytic endopeptidase complex.

Certain proteases can have both protease and chaperone-like activity.When these proteases are negatively affecting protein yield and/orquality it can be useful to delete them, and they can be overexpressedwhen their chaperone activity may positively affect protein yield and/orquality. These proteases include, but are not limited to:Hsp100(Clp/Hsl) family members RXF04587.1 (clpA), RXF08347.1, RXF04654.2(clpX), RXF04663.1, RXF01957.2 (hslU), RXF01961.2 (hslV);Peptidyl-prolyl cis-trans isomerase family member RXF05345.2 (ppiB);Metallopeptidase M20 family member RXF04892.1 (aminohydrolase);Metallopeptidase M24 family members RXF04693.1 (methionineaminopeptidase) and RXF03364.1 (methionine aminopeptidase); and SerinePeptidase S26 signal peptidase I family member RXF01181.1 (signalpeptidase).

TABLE 2 P. fluorescens strain MB214 proteases Class Family RXF GeneCurated Function Location Aspartic Peptidases A8 (signal peptidase IIfamily) RXF05383.2 Lipoprotein signal peptidase (ec Cytoplasmic3.4.23.36) Membrane A24 (type IV prepilin peptidase RXF05379.1 type 4prepilin peptidase pild (ec Cytoplasmic family) 3.4.99.—) MembraneCysteine Peptidases C15 (pyroglutamyl peptidase I RXF02161.1Pyrrolidone-carboxylate peptidase Cytoplasmic family) (ec 3.4.19.3) C40RXF01968.1 invasion-associated protein, P60 Signal peptide RXF04920.1invasion-associated protein, P60 Cytoplasmic RXF04923.1phosphatase-associated protein Signal peptide papq C56 (PfpIendopeptidase family) RXF01816.1 protease I (ec 3.4.—.—) Non-secretoryMetallopeptidases M1 RXF08773.1 Membrane alanine aminopeptidaseNon-secretory (ec 3.4.11.2) M3 RXF00561.2 prlC Oligopeptidase A (ec3.4.24.70) Cytoplasmic RXF04631.2 Zn-dependent oligopeptidasesCytoplasmic M4 (thermolysin family) RXF05113.2 Extracellularmetalloprotease Extracellular precursor (ec 3.4.24.—) M41 (FtsHendopeptidase family) RXF05400.2 Cell division protein ftsH (ecCytoplasmic 3.4.24.—) Membrane M10 RXF04304.1 Serralysin (ec 3.4.24.40)Extracellular RXF04500.1 Serralysin (ec 3.4.24.40) ExtracellularRXF01590.2 Serralysin (ec 3.4.24.40) Extracellular RXF04497.2 Serralysin(ec 3.4.24.40) Extracellular RXF04495.2 Serralysin (ec 3.4.24.40)Extracellular RXF02796.1 Serralysin (ec 3.4.24.40) Extracellular M14(carboxypeptidase A family) RXF09091.1 Zinc-carboxypeptidase precursorCytoplasmic (ec 3.4.17.—) M16 (pitrilysin family) RXF03441.1 Coenzymepqq synthesis protein F Non-secretory (ec 3.4.99.—) RXF01918.1 zincprotease (ec 3.4.99.—) Signal peptide RXF01919.1 zinc protease (ec3.4.99.—) Periplasmic RXF03699.2 processing peptidase (ec 3.4.24.64)Signal peptide M17 (leucyl aminopeptidase RXF00285.2 Cytosolaminopeptidase (ec Non-secretory family) 3.4.11.1) M18 RXF07879.1Aspartyl aminopeptidase (ec Cytoplasmic 3.4.11.21) M20 RXF00811.1 dapESuccinyl-diaminopimelate Cytoplasmic desuccinylase (ec 3.5.1.18)RXF04052.2 Xaa-His dipeptidase (ec 3.4.13.3) Signal peptide RXF01822.2Carboxypeptidase G2 precursor (ec Signal peptide 3.4.17.11) RXF09831.2::N-acyl-L-amino acid Signal peptide RXF04892.1 amidohydrolase (ec3.5.1.14) M28 (aminopeptidase Y family) RXF03488.2 Alkaline phosphataseisozyme OuterMembrane conversion protein precursor (ec 3.4.11.—) M42(glutamyl aminopeptidase RXF05615.1 Deblocking aminopeptidase (ecNon-secretory family) 3.4.11.—) M22 RXF05817.1 O-sialoglycoproteinendopeptidase Extracellular (ec 3.4.24.57) RXF03065.2 Glycoproteaseprotein family Non-secretory M23 RXF01291.2 Cell wall endopeptidase,family Signal peptide M23/M37 RXF03916.1 Membrane proteins related toSignal peptide metalloendopeptidases RXF09147.2 Cell wall endopeptidase,family Signal peptide M23/M37 M24 RXF04693.1 Methionine aminopeptidase(ec Cytoplasmic 3.4.11.18) RXF03364.1 Methionine aminopeptidase (ecNon-secretory 3.4.11.18) RXF02980.1 Xaa-Pro aminopeptidase (ecCytoplasmic 3.4.11.9) RXF06564.1 Xaa-Pro aminopeptidase (ec Cytoplasmic3.4.11.9) M48 (Ste24 endopeptidase family) RXF05137.1 Heat shock proteinHtpX Cytoplasmic Membrane RXF05081.1 Zinc metalloprotease (ec 3.4.24.—)Signal peptide M50 (S2P protease family) RXF04692.1 Membranemetalloprotease Cytoplasmic Membrane Serine Peptidases S1 (chymotrypsinfamily) RXF01250.2 protease do (ec 3.4.21.—) Periplasmic RXF07210.1protease do (ec 3.4.21.—) Periplasmic S8 (subtilisin family) RXF06755.2serine protease (ec 3.4.21.—) Non-secretory RXF08517.1 serine protease(ec 3.4.21.—) Extracellular RXF08627.2 extracellular serine protease (ecSignal peptide 3.4.21.—) RXF06281.1 Extracellular serine proteaseNon-secretory precursor (ec 3.4.21.—) RXF08978.1 extracellular serineprotease (ec OuterMembrane 3.4.21.—) RXF06451.1 serine protease (ec3.4.21.—) Signal peptide S9 (prolyl oligopeptidase family) RXF02003.2Protease ii (ec 3.4.21.83) Periplasmic RXF00458.2 HydrolaseNon-secretory S11 (D-Ala-D-Ala carboxypeptidase RXF04657.2D-alanyl-D-alanine-endopeptidase Periplasmic A family) (ec 3.4.99.—)RXF00670.1 D-alanyl-D-alanine Cytoplasmic carboxypeptidase (ec 3.4.16.4)Membrane S13 (D-Ala-D-Ala peptidase C RXF00133.1D-alanyl-meso-diaminopimelate OuterMembrane family) endopeptidase (ec3.4.—.—) RXF04960.2 D-alanyl-meso-diaminopimelate Signal peptideendopeptidase (ec 3.4.—.—) S14 (ClpP endopeptidase family) RXF04567.1clpP atp-dependent Clp protease Non-secretory proteolytic subunit (ec3.4.21.92) RXF04663.1 clpP atp-dependent Clp protease Cytoplasmicproteolytic subunit (ec 3.4.21.92) S16 (Ion protease family) RXF04653.2atp-dependent protease La (ec Cytoplasmic 3.4.21.53) RXF08653.1atp-dependent protease La (ec Cytoplasmic 3.4.21.53) RXF05943.1atp-dependent protease La (ec Cytoplasmic 3.4.21.53) S24 (LexA family)RXF00449.1 LexA repressor (ec 3.4.21.88) Non-secretory RXF03397.1 LexArepressor (ec 3.4.21.88) Cytoplasmic S26 (signal peptidase I family)RXF01181.1 Signal peptidase I (ec 3.4.21.89) Cytoplasmic Membrane S33RXF05236.1 pip3 Proline iminopeptidase (ec 3.4.11.5) Non-secretoryRXF04802.1 pip1 Proline iminopeptidase (ec 3.4.11.5) Non-secretoryRXF04808.2 pip2 Proline iminopeptidase (ec 3.4.11.5) Cytoplasmic S41(C-terminal processing RXF06586.1 Tail-specific protease (ec 3.4.21.—)Signal peptide peptidase family) RXF01037.1 Tail-specific protease (ec3.4.21.—) Signal peptide S45 RXF07170.1 pacB Penicillin acylase (ec3.5.1.11) Signal peptide 2 RXF06399.2 pacB Penicillin acylase ii (ec3.5.1.11) Signal peptide 1 S49 (protease IV family) RXF06993.2 possibleprotease sohb (ec 3.4.—.—) Non-secretory RXF01418.1 protease iv (ec3.4.—.—) Non-secretory S58 (DmpA aminopeptidase family) RXF06308.2D-aminopeptidase (ec 3.4.11.19) Cytoplasmic Membrane ThreoninePeptidases T1 (proteasome family) RXF01961.2 hslV atp-dependent proteasehslV (ec Cytoplasmic 3.4.25.—) T3 (gamma-glutamyltransferase RXF02342.1ggt1 Gamma-glutamyltranspeptidase (ec Periplasmic family) 2.3.2.2)RXF04424.2 ggt2 Gamma-glutamyltranspeptidase (ec Periplasmic 2.3.2.2)Unclassified Peptidases U32 RXF00428.1 protease (ec 3.4.—.—) CytoplasmicRXF02151.2 protease (ec 3.4.—.—) Cytoplasmic U61 RXF04715.1Muramoyltetrapeptide Non-secretory carboxypeptidase (ec 3.4.17.13) U62RXF04971.2 pmbA PmbA protein Cytoplasmic RXF04968.2 TldD proteinCytoplasmic Non MEROPS Proteases RXF00325.1 Repressor protein C2Non-secretory RXF02689.2 Microsomal dipeptidase (ec Cytoplasmic3.4.13.19) RXF02739.1 membrane dipeptidase (3.4.13.19) Signal peptideRXF03329.2 Hypothetical Cytosolic Protein Cytoplasmic RXF02492.1 Xaa-Prodipeptidase (ec 3.4.13.9) Cytoplasmic RXF04047.2 caax amino terminalprotease Cytoplasmic family Membrane RXF08136.2 protease(transglutaminase-like Cytoplasmic protein) RXF09487.1 Zincmetalloprotease (ec 3.4.24.—) Non-secretory

Additional Protein Modification Enzymes

In another embodiment, the target gene comprises a gene involved inproper protein processing and/or modification. Common modificationsinclude disulfide bond formation, glycosylation, acetylation, acylation,phosphorylation, and gamma-carboxylation, all of which can regulateprotein folding and biological activity. A non-exhaustive list ofseveral classes of enzymes involved in protein processing is found inTable 3. One of skill in the art will recognize how to identify a targetgene useful in the host cell chosen for the array, or useful with theheterologous protein of interest, from among the classes of proteinmodification enzymes listed in Table 3. The target gene may beendogenous to the host cell utilized, may be endogenous to the organismfrom which the heterologous protein of interest is derived, or may beknown to facilitate proper processing of a heterologously expressedprotein of interest. It is also recognized that any gene involved inprotein production can be targeted according to desired specificationsfor the heterologous protein of interest.

In embodiments, a target gene is a tmRNA tag-coding region. tmRNAs canadd tags to proteins to target for degradation by a process calledtrans-translation as described, e.g. by Dulebohn, D., 2007“Trans-Translation: The tmRNA-Mediated Surveillance Mechanism forRibosome Rescue, Directed Protein Degradation, and Nonstop mRNA Decay”,incorporated herein by reference. An exemplary tmRNA sequence isprovided as XFRNA203 (SEQ ID NO:157). The sequence of the molecule isshown below, with the tag coding sequence underlined and the TAA stopcodon in bold. Deletion or mutation of tmRNA sequences can result inimproved heterologous protein yield.

5′-GGGGCCGTTTAGGATTCGACGCCGGTCGCGAAACTTTAGGTGCATGCCGAGTTGGTAACAGAACTCGTAAATCCACTGTTGCAACTTCTTATAGTTGCCAATGACGAAAACTACGGCCAGGAATTCGCTCTCGCTGCG TAAGCAGCCTTAGCCCTGAGCTTCTGGTACCTTCGGGTCCAGCAATCACCAGGGGATGTCTGTAAACCCAAAGTGATTGTCATATAGAACAGAATCGCCGTGCAGTACGTTGTGGACGAAGCGGCTAAAACTTACACAACTCGCCCAAAGCACCCTGCCCTTCGGGTCGCTGAGGGTTAACTTAATAGAAACGGCTACGCATGTAGTACCGACAGCGGAGTACTGGCGGACGGGGGTTCAAATCCCCCCGGCTCCACCA C-3′

TABLE 3 Classes of enzymes involved in protein processing Class ExamplesGlycosyltransferases (EC 2.4.1.18) α-glucan-branchingglycosyltransferase enzymatic branching factor branchingglycosyltransferase enzyme Q glucosan transglycosylase glycogenbranching enzyme amylose isomerase plant branching enzymeα-1,4-glucan:α-1,4-glucan-6-glycosyltransferase starch branching enzymeUDP-N-acetyl-D-galactosamine:polypeptideN-acetylgalactosaminyltransferase GDP-fucose proteinO-fucosyltransferase 2 O-GlcNAc transferase Histone acetyltransferase(EC 2.3.1.48) nucleosome-histone acetyltransferase histone acetokinasehistone acetylase histone transacetylase histone deacetylase Proteinkinase (EC 2.7) non-specific serine/threonine protein kinaseFas-activated serine/threonine kinase Goodpasture antigen-bindingprotein kinase IκB kinase cAMP-dependent protein kinase cGMP-dependentprotein kinase protein kinase C polo kinase cyclin-dependent kinasemitogen-activated protein kinase mitogen-activated protein kinase kinasekinase receptor protein serine/threonine kinase dual-specificity kinasePhosphatase (EC 3.1.3.48) protein-tyrosine-phosphatase phosphotyrosinephosphatase phosphoprotein phosphatase (phosphotyrosine) phosphotyrosinehistone phosphatase protein phosphotyrosine phosphatase tyrosylproteinphosphatase phosphotyrosine protein phosphatase phosphotyrosylproteinphosphatase tyrosine O-phosphate phosphatase PPT-phosphatase PTPase[phosphotyrosine]protein phosphatase PTP-phosphatase

Methods for Modulating the Expression of Target Genes

One or more host cell populations of the array can be modified by anytechnique known in the art, for example by a technique wherein at leastone target gene is knocked out of the genome, or by mutating at leastone target gene to reduce expression of the gene, by altering at leastone promoter of at least one target gene to reduce expression of thetarget gene, or by coexpressing (with the heterologous protein orpolypeptide of interest) the target gene or an inhibitor of the targetgene in the host genome. As discussed supra, the target gene can beendogenous to the host cell populations in the array, or can beheterologously expressed in each of the host cell populations.

The expression of target genes can be increased, for example, byintroducing into at least one cell in a host population an expressionvector comprising one or more target genes involved in proteinproduction. The target gene expression can also be increased, forexample, by mutating a promoter of a target gene. A host cell ororganism that expresses a heterologous protein can also be geneticallymodified to increase the expression of at least one target gene involvedin protein production and decrease the expression of at least one targetgene involved in protein degradation.

The genome may be modified to modulate the expression of one or moretarget genes by including an exogenous gene or promoter element in thegenome or in the host with an expression vector, by enhancing thecapacity of a particular target gene to produce mRNA or protein, bydeleting or disrupting a target gene or promoter element, or by reducingthe capacity of a target gene to produce mRNA or protein. The geneticcode can be altered, thereby affecting transcription and/or translationof a target gene, for example through substitution, deletion(“knock-out”), co-expression, or insertion (“knock-in”) techniques.Additional genes for a desired protein or regulatory sequence thatmodulate transcription of an existing target sequence can also beinserted.

Genome Modification

The genome of the host cell can be modified via a genetic targetingevent, which can be by insertion or recombination, for examplehomologous recombination. Homologous recombination refers to the processof DNA recombination based on sequence homology. Homologousrecombination permits site-specific modifications in endogenous genesand thus novel alterations can be engineered into a genome (see, forexample Radding (1982) Ann. Rev. Genet. 16: 405; U.S. Pat. No.4,888,274).

Various constructs can be prepared for homologous recombination at atarget locus. Usually, the construct can include at least 10 bp, 20 bp,30 bp, 40 bp, 50 bp, 70 bp, 100 bp, 500 bp, 1 kbp, 2 kbp, 4 kbp, 5 kbp,10 kbp, 15 kbp, 20 kbp, or 50 kbp of sequence homologous with theidentified locus. Various considerations can be involved in determiningthe extent of homology of target gene sequences, such as, for example,the size of the target locus, availability of sequences, relativeefficiency of double cross-over events at the target locus and thesimilarity of the target sequence with other sequences.

The modified gene can include a sequence in which DNA substantiallyisogenic flanks the desired sequence modifications with a correspondingtarget sequence in the genome to be modified. The “modified gene” is thesequence being introduced into the genome to alter the expression of aprotease or a protein folding modulator in the host cell. The “targetgene” is the sequence that is being replaced by the modified gene. Thesubstantially isogenic sequence can be at least about 95%, 97-98%,99.0-99.5%, 99.6-99.9%, or 100% identical to the corresponding targetsequence (except for the desired sequence modifications). The modifiedgene and the targeted gene can share stretches of DNA at least about 10,20, 30, 50, 75, 150 or 500 base pairs that are 100% identical.

Nucleotide constructs can be designed to modify the endogenous, targetgene product. The modified gene sequence can have one or more deletions,insertions, substitutions or combinations thereof designed to disruptthe function of the resultant gene product. In one embodiment, thealteration can be the insertion of a selectable marker gene fused inreading frame with the upstream sequence of the target gene.

The genome can also be modified using insertional inactivation. In thisembodiment, the genome is modified by recombining a sequence in the genethat inhibits gene product formation. This insertion can either disruptthe gene by inserting a separate element, or remove an essential portionof the gene. In one embodiment, the insertional deletion also includesinsertion of a gene coding for resistance to a particular stressor, suchas an antibiotic, or for growth in a particular media, for example forproduction of an essential amino acid.

The genome can also be modified by use of transposons, which are geneticelements capable of inserting at sites in prokaryote genomes bymechanisms independant of homologous recombination. Transposons caninclude, for example, Tn7, Tn5, or Tn10 in E. coli, Tn554 in S. aureus,IS900 in M. paratuberculosis, IS492 from Pseudomonas atlantica, IS116from Streptomyces and IS900 from M. paratuberculosis. Steps believed tobe involved in transposition include cleavage of the end of thetransposon to yield 3′OH; strand transfer, in which transposase bringstogether the 3′OH exposed end of transposon and the identified sequence;and a single step transesterification reaction to yield a covalentlinkage of the transposon to the identified DNA. The key reactionperformed by transposase is generally thought to be nicking or strandexchange, the rest of the process is done by host enzymes.

In one embodiment, the expression or activity of a target gene orprotein is increased by incorporating a genetic sequence encoding thetarget protein or homolog thereof into the genome by recombination. Inanother embodiment, a promoter is inserted into the genome to enhancethe expression of the target gene or homolog. In another embodiment, theexpression or activity of a target gene or homolog thereof is decreasedby recombination with an inactive gene. In another embodiment, asequence that encodes a different gene, which can have a separatefunction in the cell or can be a reporter gene such as a resistancemarker or an otherwise detectable marker gene, can be inserted into thegenome through recombination. In yet another embodiment, a copy of atleast a portion of the target gene that has been mutated at one or morelocations is inserted into the genome through recombination. The mutatedversion of the target gene may not encode a protein, or the proteinencoded by the mutated gene may be rendered inactive, the activity maybe modulated (either increased or decreased), or the mutant protein canhave a different activity when compared to the native protein.

There are strategies to knock out genes in bacteria, which have beengenerally exemplified in E. coli. One route is to clone a gene-internalDNA fragment into a vector containing an antibiotic resistance gene(e.g. ampicillin). Before cells are transformed via conjugativetransfer, chemical transformation or electroporation (Puehler, et al.(1984) Advanced Molecular Genetics New York, Heidelberg, Berlin, Tokyo,Springer Verlag), an origin of replication, such as the vegetativeplasmid replication (the oriV locus) is excised and the remaining DNAfragment is re-ligated and purified (Sambrook, et al. (2000) Molecularcloning: A laboratory manual, third edition Cold Spring Harbor, N.Y.,Cold Spring Harbor Laboratory Press). Alternatively,antibiotic-resistant plasmids that have a DNA replication origin can beused. After transformation, the cells are plated onto e.g. LB agarplates containing the appropriate antibiotics (e.g. 200 micrograms/mLampicillin). Colonies that grow on the plates containing the antibioticspresumably have undergone a single recombination event (Snyder, L., W.Champness, et al. (1997) Molecular Genetics of Bacteria Washington D.C.,ASM Press) that leads to the integration of the entire DNA fragment intothe genome at the homologous locus. Further analysis of theantibiotic-resistant cells to verify that the desired gene knock-out hasoccurred at the desired locus is e.g. by diagnostic PCR (McPherson, M.J., P. Quirke, et al. (1991) PCR: A Practical Approach New York, OxfordUniversity Press). Here, at least two PCR primers are designed: one thathybridizes outside the DNA region that was used for the construction ofthe gene knock-out; and one that hybridizes within the remaining plasmidbackbone. Successful PCR amplification of the DNA fragment with thecorrect size followed by DNA sequence analysis will verify that the geneknock-out has occurred at the correct location in the bacterialchromosome. The phenotype of the newly constructed mutant strain canthen be analyzed by, e.g., SDS polyacrylamide gel electrophoresis(Simpson, R. J. (2003) Proteins and Proteomics—A Laboratory Manual. ColdSpring Harbor, N.Y., Cold Spring Harbor Laboratory Press).

An alternate route to generate a gene knock-out is by use of atemperature-sensitive replicon, such as the pSC101 replicon tofacilitate gene replacement (Hamilton, et al. (1989) Journal ofBacteriology 171(9): 4617-22). The process proceeds by homologousrecombination between a gene on a chromosome and homologous sequencescarried on a plasmid temperature sensitive for DNA replication. Aftertransformation of the plasmid into the appropriate host, it is possibleto select for integration of the plasmid into the chromosome at 44° C.Subsequent growth of these cointegrates at 30° C. leads to a secondrecombination event, resulting in their resolution. Depending on wherethe second recombination event takes place, the chromosome will eitherhave undergone a gene replacement or retain the original copy of thegene.

Other strategies have been developed to inhibit expression of particulargene products. For example, RNA interference (RNAi), particularly usingsmall interfering RNA (siRNA), has been extensively developed to reduceor even eliminate expression of a particular gene product. siRNAs areshort, double-stranded RNA molecules that can target complementary mRNAsfor degradation. RNAi is the phenomenon in which introduction of adouble-stranded RNA suppresses the expression of the homologous gene.dsRNA molecules are reduced in vivo to 21-23 nt siRNAs which are themediators of the RNAi effect. Upon introduction, double stranded RNAsget processed into 20-25 nucleotide siRNAs by an RNase III-like enzymecalled Dicer (initiation step). Then, the siRNAs assemble intoendoribonuclease-containing complexes known as RNA-induced silencingcomplexes (RISCs), unwinding in the process. The siRNA strandssubsequently guide the RISCs to complementary RNA molecules, where theycleave and destroy the cognate RNA (effecter step). Cleavage of cognateRNA takes place near the middle of the region bound by the siRNA strand.RNAi has been successfully used to reduce gene expression in a varietyof organisms including zebrafish, nematodes (C. elegans), insects(Drosophila melanogaster), planaria, cnidaria, trypanosomes, mice andmammalian cells.

The genome can also be modified by mutation of one or more nucleotidesin an open reading frame encoding a target gene. Techniques for geneticmutation, for instance site directed mutagenesis, are well known in theart. Some approaches focus on the generation of random mutations inchromosomal DNA such as those induced by X-rays and chemicals.

Coexpression

In one embodiment, one or more target genes in the host cell can bemodified by including one or more vectors that encode the target gene(s)to facilitate coexpression of the target gene with the heterologousprotein or peptide. In another embodiment, the host cell is modified byenhancing a promoter for a target gene, including by adding an exogenouspromoter to the host cell genome.

In another embodiment, one or more target genes in the host cell ismodified by including one or more vectors that encode an inhibitor of atarget gene, such as a protease inhibitor to inhibit the activity of atarget protease. Such an inhibitor can be an antisense molecule thatlimits the expression of the target gene, a cofactor of the target geneor a homolog of the target gene. Antisense is generally used to refer toa nucleic acid molecule with a sequence complementary to at least aportion of the target gene. In addition, the inhibitor can be aninterfering RNA or a gene that encodes an interfering RNA. In Eukaryoticorganisms, such an interfering RNA can be a small interfering RNA or aribozyme, as described, for example, in Fire, A. et al. (1998) Nature391:806-11, Elbashir et al. (2001) Genes & Development 15(2):188-200,Elbashir et al. (2001) Nature 411(6836):494-8, U.S. Pat. Nos. 6,506,559to Carnegie Institute, 6,573,099 to Benitec, U.S. patent applicationNos. 2003/0108923 to the Whitehead Inst., and 2003/0114409, PCTPublication Nos. WO03/006477, WO03/012052, WO03/023015, WO03/056022,WO03/064621 and WO03/070966.

The inhibitor can also be another protein or peptide. The inhibitor can,for example, be a peptide with a consensus sequence for the targetprotein. The inhibitor can also be a protein or peptide that can producea direct or indirect inhibitory molecule for the target protein in thehost. For example, protease inhibitors can include Amastatin, E-64,Antipain, Elastatinal, APMSF, Leupeptin, Bestatin, Pepstatin,Benzamidine, 1,10-Phenanthroline, Chymostatin, Phosphoramidon,3,4-dichloroisocoumarin, TLCK, DFP, TPCK. Over 100 naturally occurringprotein protease inhibitors have been identified so far. They have beenisolated in a variety of organisms from bacteria to animals and plants.They behave as tight-binding reversible or pseudo-irreversibleinhibitors of proteases preventing substrate access to the active sitethrough steric hindrance. Their size are also extremely variable from 50residues (e.g BPTI: Bovine Pancreatic Trypsin Inhibitor) to up to 400residues (e.g alpha-1PI: alpha-1 Proteinase Inhibitor). They arestrictly class-specific except proteins of the alpha-macroglobulinfamily (e.g alpha-2 macroglobulin) which bind and inhibit most proteasesthrough a molecular trap mechanism.

An exogenous vector or DNA construct can be transfected or transformedinto the host cell. Techniques for transfecting and transformingeukaryotic and prokaryotic cells respectively with exogenous nucleicacids are well known in the art. These can include lipid vesiclemediated uptake, calcium phosphate mediated transfection (calciumphosphate/DNA co-precipitation), viral infection, particularly usingmodified viruses such as, for example, modified adenoviruses,microinjection and electroporation. For prokaryotic transformation,techniques can include heat shock mediated uptake, bacterial protoplastfusion with intact cells, microinjection and electroporation. Techniquesfor plant transformation include Agrobacterium mediated transfer, suchas by A. tumefaciens, rapidly propelled tungsten or goldmicroprojectiles, electroporation, microinjection and polyethelyneglycol mediated uptake. The DNA can be single or double stranded, linearor circular, relaxed or supercoiled DNA. For various techniques fortransfecting mammalian cells, see, for example, Keown et al. (1990)Processes in Enzymology Vol. 185, pp. 527-537.

An expression construct encoding a target gene or an enhancer orinhibitor thereof can be constructed as described below for theexpression constructs comprising the heterologous protein or polypeptideof interest. For example, the constructs can contain one, or more thanone, internal ribosome entry site (IRES). The construct can also containa promoter operably linked to the nucleic acid sequence encoding atleast a portion of the target gene, or a cofactor of the target gene, amutant version of at least a portion of the target gene, or in someembodiments, an inhibitor of the target gene. Alternatively, theconstruct can be promoterless. In cases in which the construct is notdesigned to incorporate into the cellular DNA/genome, the vectortypically contains at least one promoter element. In addition to thenucleic acid sequences, the expression vector can contain selectablemarker sequences. The expression constructs can further contain sitesfor transcription initiation, termination, and/or ribosome bindingsites. The identified constructs can be inserted into and can beexpressed in any prokaryotic or eukaryotic cell, including, but notlimited to bacterial cells, such as P. fluorescens or E. coli, yeastcells, mammalian cells, such as CHO cells, or plant cells.

The construct can be prepared in accordance with processes known in theart. Various fragments can be assembled, introduced into appropriatevectors, cloned, analyzed and then manipulated further until the desiredconstruct has been achieved. Various modifications can be made to thesequence, to allow for restriction analysis, excision, identification ofprobes, etc. Silent mutations can be introduced, as desired. At variousstages, restriction analysis, sequencing, amplification with thepolymerase chain reaction, primer repair, in vitro mutagenesis, etc. canbe employed. Processes for the incorporation of antibiotic resistancegenes and negative selection factors will be familiar to those ofordinary skill in the art (see, e.g., WO 99/15650; U.S. Pat. No.6,080,576; U.S. Pat. No. 6,136,566; Niwa, et al., J. Biochem.113:343-349 (1993); and Yoshida, et al., Transgenic Research, 4:277-287(1995)).

The construct can be prepared using a bacterial vector, including aprokaryotic replication system, e.g. an origin recognizable by aprokaryotic cell such as P. fluorescens or E. coli. A marker, the sameas or different from the marker to be used for insertion, can beemployed, which can be removed prior to introduction into the host cell.Once the vector containing the construct has been completed, it can befurther manipulated, such as by deletion of certain sequences,linearization, or by introducing mutations, deletions or other sequencesinto the homologous sequence. In one embodiment, the target geneconstruct and the heterologous protein construct are part of the sameexpression vector, and may or may not be under the control of the samepromoter element. In another embodiment, they are on separate expressionvectors. After final manipulation, the construct can be introduced intothe cell.

Cell Growth Conditions

The cell growth conditions for the host cells described herein includethat which facilitates expression of the protein of interest in at leastone strain in the array (or at least a proportion of cells thereof),and/or that which facilitates fermentation of the expressed protein ofinterest. As used herein, the term “fermentation” includes bothembodiments in which literal fermentation is employed and embodiments inwhich other, non-fermentative culture modes are employed. Growth,maintenance, and/or fermentation of the populations of host cells in thearray may be performed at any scale. However, where multiple populationsof host cells are screened simultaneously, the scale will be limited bythe number of different populations and the capacity to grow and testmultiple populations of host cells. In one embodiment, the fermentationmedium may be selected from among rich media, minimal media, and mineralsalts media. In another embodiment either a minimal medium or a mineralsalts medium is selected. In still another embodiment, a minimal mediumis selected. In yet another embodiment, a mineral salts medium isselected.

Mineral salts media consists of mineral salts and a carbon source suchas, e.g., glucose, sucrose, or glycerol. Examples of mineral salts mediainclude, e.g., M9 medium, Pseudomonas medium (ATCC 179), Davis andMingioli medium (see, BD Davis & ES Mingioli (1950) in J. Bact.60:17-28). The mineral salts used to make mineral salts media includethose selected from among, e.g., potassium phosphates, ammonium sulfateor chloride, magnesium sulfate or chloride, and trace minerals such ascalcium chloride, borate, and sulfates of iron, copper, manganese, andzinc. No organic nitrogen source, such as peptone, tryptone, aminoacids, or a yeast extract, is included in a mineral salts medium.Instead, an inorganic nitrogen source is used and this may be selectedfrom among, e.g., ammonium salts, aqueous ammonia, and gaseous ammonia.A preferred mineral salts medium will contain glucose as the carbonsource. In comparison to mineral salts media, minimal media can alsocontain mineral salts and a carbon source, but can be supplemented with,e.g., low levels of amino acids, vitamins, peptones, or otheringredients, though these are added at very minimal levels.

In one embodiment, media can be prepared using the components listed inTable 4 below. The components can be added in the following order: first(NH₄)HPO₄, KH₂PO₄ and citric acid can be dissolved in approximately 30liters of distilled water; then a solution of trace elements can beadded, followed by the addition of an antifoam agent, such as Ucolub N115. Then, after heat sterilization (such as at approximately 121° C.),sterile solutions of glucose MgSO₄ and thiamine-HCL can be added.Control of pH at approximately 6.8 can be achieved using aqueousammonia. Sterile distilled water can then be added to adjust the initialvolume to 371 minus the glycerol stock (123 mL). The chemicals arecommercially available from various suppliers, such as Merck.

TABLE 4 Medium composition Component Initial concentration KH₂PO₄ 13.3 gl⁻¹ (NH₄)₂HPO₄ 4.0 g l⁻¹ Citric Acid 1.7 g l⁻¹ MgSO₄—7H₂O 1.2 g l⁻¹Trace metal solution 10 ml l⁻¹ Thiamin HCl 4.5 mg l⁻¹ Glucose-H₂O 27.3 gl⁻¹ Antifoam Ucolub N115 0.1 ml l⁻¹ Feeding solution MgSO₄—7H₂O 19.7 gl⁻¹ Glucose-H₂O 770 g l⁻¹ NH₃ 23 g Trace metal solution 6 g l⁻¹ Fe(III)citrate 1.5 g l⁻¹ MnCl₂—4H₂O 0.8 g l⁻¹ ZmCH₂COOI₂—2H₂O 0.3 g l⁻¹ H₃BO₃0.25 g l⁻¹ Na₂MoO₄—2H₂O 0.25 g l⁻¹ CoCl₂6H₂O 0.15 g l⁻¹ CuCl₂2H₂O 0.84 gl⁻¹ ethylene Dinitrilo-tetracetic acid Na₂ sah 2H₂O (Tritriplex III,Merck)

In the present invention, growth, culturing, and/or fermentation of thetransformed host cells is performed within a temperature rangepermitting survival of the host cells, preferably a temperature withinthe range of about 4° C. to about 55° C., inclusive. Thus, e.g., theterms “growth” (and “grow,” “growing”), “culturing” (and “culture”), and“fermentation” (and “ferment,” “fermenting”), as used herein in regardto the host cells of the present invention, inherently means “growth,”“culturing,” and “fermentation,” within a temperature range of about 4°C. to about 55° C., inclusive. In addition, “growth” is used to indicateboth biological states of active cell division and/or enlargement, aswell as biological states in which a non-dividing and/or non-enlargingcell is being metabolically sustained, the latter use of the term“growth” being synonymous with the term “maintenance.”

The host cells of the array should be grown and maintained at a suitabletemperature for normal growth of that cell type. Such normal growthtemperatures may be readily selected based on the known growthrequirements of the selected host cell. Preferably, during theestablishment of the culture and particularly during course of thescreening, the cell culture is incubated in a controlled CO₂/N₂ humiditysuitable for growth of the selected cells before and aftertransformation with the heterologous protein or polypeptide of interest.The humidity of the incubation is controlled to minimize evaporationfrom the culture vessel, and permit the use of smaller volumes.Alternatively, or in addition to controlling humidity, the vessels maybe covered with lids in order to minimize evaporation. Selection of theincubation temperature depends primarily upon the identity of the hostcells utilized. Selection of the percent humidity to control evaporationis based upon the selected volume of the vessel and concentration andvolume of the cell culture in the vessel, as well as upon the incubationtemperature. Thus, the humidity may vary from about 10% to about 80%. Itshould be understood that selection of a suitable conditions is wellwithin the skill of the art.

Screening

The strain array described herein can be screened for the optimal hostcell population in which to express a heterologous protein of interest.The optimal host cell population can be identified or selected based onthe quantity, quality, and/or location of the expressed protein ofinterest. In one embodiment, the optimal host cell population is onethat results in an increased yield of the protein or polypeptide ofinterest within the host cell compared to other populations ofphenotypically distinct host cells in the array, e.g., an indicatorexpression system.

An indicator expression system is any heterologous protein expressionsystem that is used for comparison of protein expression. An indicatorexpression system can be a) a second test expression system present inthe same array or b) a standard expression system. A second testexpression system refers to any test expression system on the array thatis different from the expression system on the array that is beingcompared to the indicator expression system. A standard expressionsystem is a heterologous protein expression system used as a standard,for example, one comprising a host from which the test expression systemfor comparison was derived, the host transformed with a heterologousprotein expression vector that does not contain a secretion leader. Inother embodiments the vector is the same as that used in the testexpression system. A standard expression system for use in a Pseudomonasexpression array of the invention, can be, e.g., a DC454 expressionsystem. A DC454 expression system refers to a DC454 host transformedwith an expression vector encoding the heterologous protein. In otherembodiments, the standard expression system contains expression elements(e.g., protease mutations, folding modulator overexpression constructs,secretion leaders) not present in a wild type expression system, butfewer or different expression elements than does the test expressionsystem that is being compared. A standard expression system for use inan E. coli expression array of the invention can be, e.g., BL21(DE3), orany other appropriate strain selected by one of skill in the art for theexperiment at hand. A null strain refers to a wild type host cellpopulation transformed with a vector that does not express theheterologous protein.

The increased production alternatively can be an increased level ofproperly processed protein or polypeptide per gram of protein produced,or per gram of host protein. The increased production can also be anincreased level of recoverable protein or polypeptide produced per gramof heterologous protein or per gram of host cell protein. The increasedproduction can also be any combination of an increased level of totalprotein, increased level of properly processed or properly foldedprotein, or increased level of active or soluble protein. In thisembodiment, the term “increased” or “improved” is relative to the levelof protein or polypeptide that is produced, properly processed, soluble,and/or recoverable when the protein or polypeptide of interest isexpressed in one or more other populations of host cells in the array.The increased production may optimize the efficiency of the cell ororganism by for example, decreasing the energy expenditure, increasingthe use of available resources, or decreasing the requirements forgrowth supplements in growth media. The increased production may also bethe result of a decrease in proteolyic degradation of the expressedprotein.

In one embodiment, at least one strain in the array produces at least0.1 mg/ml correctly processed protein. A correctly processed protein hasan amino terminus of the native protein. In another embodiment, at leastone strain produces 0.1 to 10 mg/ml correctly processed protein in thecell, including at least about 0.2, about 0.3, about 0.4, about 0.5,about 0.6, about 0.7, about 0.8, about 0.9 or at least about 1.0 mg/mlcorrectly processed protein. In another embodiment, the total correctlyprocessed protein or polypeptide of interest produced by at least onestrain in the array is at least 1.0 mg/ml, at least about 2 mg/ml, atleast about 3 mg/ml, about 4 mg/ml, about 5 mg/ml, about 6 mg/ml, about7 mg/ml, about 8 mg/ml, about 10 mg/ml, about 15 mg/ml, about 20 mg/ml,about 25 mg/ml, about 30 mg/ml, about 35 mg/ml, about 40 mg/ml, about 45mg/ml, at least about 50 mg/ml, or greater. In some embodiments, theamount of correctly processed protein produced is at least about 5%,about 10%, about 15%, about 20%, about 25%, about 30%, about 40%, about50%, about 60%, about 70%, about 80%, about 90%, about 95%, about 96%,about 97%, about 98%, at least about 99%, or more of total heterologousprotein in a correctly processed form.

An improved expression of a protein or polypeptide of interest can alsorefer to an increase in the solubility of the protein. The protein orpolypeptide of interest can be produced and recovered from thecytoplasm, periplasm or extracellular medium of the host cell. Theprotein or polypeptide can be insoluble or soluble. The protein orpolypeptide can include one or more targeting (e.g., signal or leader)sequences or sequences to assist purification, as discussed supra.

The term “soluble” as used herein means that the protein is notprecipitated by centrifugation at between approximately 5,000 and20,000× gravity when spun for 10-30 minutes in a buffer underphysiological conditions. Soluble proteins are not part of an inclusionbody or other precipitated mass. Similarly, “insoluble” means that theprotein or polypeptide can be precipitated by centrifugation at between5,000 and 20,000× gravity when spun for 10-30 minutes in a buffer underphysiological conditions. Insoluble proteins or polypeptides can be partof an inclusion body or other precipitated mass. Some proteins, e.g.,membrane proteins, can fractionate with the insoluble proteins, thoughthey are active. Therefore, it is understood that an insoluble proteinis not necessarily inactive. The term “inclusion body” is meant toinclude any intracellular body contained within a cell wherein anaggregate of proteins or polypeptides has been sequestered.

In another embodiment, the optimal host cell population produces anincreased amount of the protein of interest that is transported to theperiplasm or secreted into the extracellular space of the host cell. Inone embodiment, at least one strain in the array produces at least 0.1mg/ml protein in the periplasmic compartment. In another embodiment, atleast one strain produces 0.1 to 10 mg/ml periplasmic protein in thecell, or at least about 0.2, about 0.3, about 0.4, about 0.5, about 0.6,about 0.7, about 0.8, about 0.9 or at least about 1.0 mg/ml periplasmicprotein. In one embodiment, the total protein or polypeptide of interestproduced by at least one strain in the array is at least 1.0 mg/ml, atleast about 2 mg/ml, at least about 3 mg/ml, about 4 mg/ml, about 5mg/ml, about 6 mg/ml, about 7 mg/ml, about 8 mg/ml, about 10 mg/ml,about 15 mg/ml, about 20 mg/ml, at least about 25 mg/ml, or greater. Insome embodiments, the amount of periplasmic protein produced is at leastabout 5%, about 10%, about 15%, about 20%, about 25%, about 30%, about40%, about 50%, about 60%, about 70%, about 80%, about 90%, about 95%,about 96%, about 97%, about 98%, about 99%, or more of total protein orpolypeptide of interest produced.

At least one strain in the array of the invention can also lead toincreased yield of the protein or polypeptide of interest. In oneembodiment, at least one strain produces a protein or polypeptide ofinterest as at least about 5%, at least about 10%, about 15%, about 20%,about 25%, about 30%, about 40%, about 45%, about 50%, about 55%, about60%, about 65%, about 70%, about 75%, or greater of total cell protein(tcp). “Percent total cell protein” is the amount of protein orpolypeptide in the host cell as a percentage of aggregate cellularprotein. Methods for the determination of the percent total cell proteinare well known in the art.

In a particular embodiment, at least one host cell population in thearray can have a heterologous protein production level of at least 1%tcp and a cell density of at least 40 mg/ml, when grown (i.e. within atemperature range of about 4° C. to about 55° C., including about 10°C., about 15° C., about 20° C., about 25° C., about 30° C., about 35°C., about 40° C., about 45° C., and about 50° C.) in a mineral saltsmedium. In a particularly preferred embodiment, the expression systemwill have a protein or polypeptide expression level of at least 5% tcpand a cell density of at least 40 g/L, when grown (i.e. within atemperature range of about 4° C. to about 55° C., inclusive) in amineral salts medium.

In practice, heterologous proteins targeted to the periplasm are oftenfound in the broth (see European Patent No. EP 0 288 451), possiblybecause of damage to or an increase in the fluidity of the outer cellmembrane. The rate of this “passive” secretion may be increased by usinga variety of mechanisms that permeabilize the outer cell membrane,including: colicin (Miksch et al. (1997) Arch. Microbiol. 167: 143-150);growth rate (Shokri et al. (2002) App Miocrobiol Biotechnol 58:386-392);TolIII overexpression (Wan and Baneyx (1998) Protein Expression Purif.14: 13-22); bacteriocin release protein (Hsiung et al. (1989)Bio/Technology 7: 267-71), colicin A lysis protein (Lloubes et al.(1993) Biochimie 75: 451-8) mutants that leak periplasmic proteins(Furlong and Sundstrom (1989) Developments in Indus. Microbio. 30:141-8); fusion partners (Jeong and Lee (2002) Appl. Environ. Microbio.68: 4979-4985); or, recovery by osmotic shock (Taguchi et al. (1990)Biochimica Biophysica Acta 1049: 278-85). Transport of engineeredproteins to the periplasmic space with subsequent localization in thebroth has been used to produce properly folded and active proteins in E.coli (Wan and Baneyx (1998) Protein Expression Pur 14: 13-22; Simmons etal. (2002) J. Immun. Meth. 263: 133-147; Lundell et al. (1990) J.Indust. Microbio. 5: 215-27).

The method may also include the step of purifying the protein orpolypeptide of interest from the periplasm or from extracellular media.The heterologous protein or polypeptide can be expressed in a manner inwhich it is linked to a tag protein and the “tagged” protein can bepurified from the cell or extracellular media.

In some embodiments, the protein or polypeptide of interest can also beproduced by at least one strain in the array in an active form. The term“active” means the presence of biological activity, wherein thebiological activity is comparable or substantially corresponds to thebiological activity of a corresponding native protein or polypeptide. Inthe context of proteins this typically means that a polynucleotide orpolypeptide comprises a biological function or effect that has at leastabout 20%, about 50%, preferably at least about 60-80%, and mostpreferably at least about 90-95% activity compared to the correspondingnative protein or polypeptide using standard parameters. However, insome embodiments, it may be desirable to produce a polypeptide that hasaltered or improved activity compared to the native protein (e.g, onethat has altered or improved immunoreactivity, substrate specificity,etc). An altered or improved polypeptide may result from a particularconformation created by one or more of the host cell populations of thearray.

The determination of protein or polypeptide activity can be performedutilizing corresponding standard, targeted comparative biological assaysfor particular proteins or polypeptides which can be used to assessbiological activity.

The recovery of active protein or polypeptide of interest may also beimproved in the optimal host strain compared to one or more otherstrains in the array of the invention. Active proteins can have aspecific activity of at least about 20%, at least about 30%, at leastabout 40%, about 50%, about 60%, at least about 70%, about 80%, about90%, or at least about 95% that of the native protein or polypeptidefrom which the sequence is derived. Further, the substrate specificity(k_(cat)/K_(m)) is optionally substantially similar to the nativeprotein or polypeptide. Typically, k_(cat)/K_(m) will be at least about30%, about 40%, about 50%, about 60%, about 70%, about 80%, at leastabout 90%, at least about 95%, or greater. Methods of assaying andquantifying measures of protein and polypeptide activity and substratespecificity (k_(cat)/K_(m)), are well known to those of skill in theart.

Measurement of Protein Activity

The activity of the heterologously-expressed protein or polypeptide ofinterest can be compared with a previously established native protein orpolypeptide standard activity. Alternatively, the activity of theprotein or polypeptide of interest can be determined in a simultaneous,or substantially simultaneous, comparative assay with the native proteinor polypeptide. For example, in vitro assays can be used to determineany detectable interaction between a protein or polypeptide of interestand a target, e.g. between an expressed enzyme and substrate, betweenexpressed hormone and hormone receptor, between expressed antibody andantigen, etc. Such detection can include the measurement of calorimetricchanges, proliferation changes, cell death, cell repelling, changes inradioactivity, changes in solubility, changes in molecular weight asmeasured by gel electrophoresis and/or gel exclusion methods,phosphorylation abilities, antibody specificity assays such as ELISAassays, etc. In addition, in vivo assays include, but are not limitedto, assays to detect physiological effects of the heterologouslyexpressed protein or polypeptide in comparison to physiological effectsof the native protein or polypeptide, e.g. weight gain, change inelectrolyte balance, change in blood clotting time, changes in clotdissolution and the induction of antigenic response. Generally, any invitro or in vivo assay can be used to determine the active nature of theprotein or polypeptide of interest that allows for a comparativeanalysis to the native protein or polypeptide so long as such activityis assayable. Alternatively, the proteins or polypeptides produced in atleast one strain in the array of the present invention can be assayedfor the ability to stimulate or inhibit interaction between the proteinor polypeptide and a molecule that normally interacts with the proteinor polypeptide, e.g. a substrate or a component of a signal pathway withwhich the native protein normally interacts. Such assays can typicallyinclude the steps of combining the protein with a substrate moleculeunder conditions that allow the protein or polypeptide to interact withthe target molecule, and detect the biochemical consequence of theinteraction with the protein and the target molecule.

Assays that can be utilized to determine protein or polypeptide activityare described, for example, in Ralph, P. J., et al. (1984) J. Immunol.132:1858 or Saiki et al. (1981) J. Immunol. 127:1044, Steward, W. E. II(1980) The Interferon Systems. Springer-Verlag, Vienna and New York,Broxmeyer, H. E., et al. (1982) Blood 60:595, Molecular Cloning: ALaboratory Manua”, 2d ed., Cold Spring Harbor Laboratory Press,Sambrook, J., E. F. Fritsch and T. Maniatis eds., 1989, and Methods inEnzymology: Guide to Molecular Cloning Techniques, Academic Press,Berger, S. L. and A. R. Kimmel eds., 1987, A K Patra et al., ProteinExpr Purif, 18(2): p/182-92 (2000), Kodama et al., J. Biochem. 99:1465-1472 (1986); Stewart et al., Proc. Nat'l Acad. Sci. USA 90:5209-5213 (1993); (Lombillo et al., J. Cell Biol. 128:107-115 (1995);(Vale et al., Cell 42:39-50 (1985). Activity can be compared betweensamples of heterologously expressed protein derived from one or more ofthe other host cell populations in the array, or can be compared to theactivity of a native protein, or both. Activity measurements can beperformed on isolated protein, or can be performed in vitro in the hostcell.

In another embodiment, protein production and/or activity may bemonitored directly in the culture by fluorescence or spectroscopicmeasurements on, for example, a conventional microscope, luminometer, orplate reader. Where the protein of interest is an enzyme whose substrateis known, the substrate can be added to the culture media wherein afluorescent signal is emitted when the substrate is converted by theenzyme into a product. In one embodiment, the expression constructencoding the heterologous protein or polypeptide of interest furtherencodes a reported protein. By “reporter protein” is meant a proteinthat by its presence in or on a cell or when secreted in the mediaallows the cell to be distinguished from a cell that does not containthe reporter protein. Production of the heterologous protein of interestresults in a detectable change in the host cell population. The reportermolecule can be firefly luciferase and GFP or any other fluorescencemolecule, as well as beta-galactosidase gene (beta.gal) andchloramphenicol and acetyltransferase gene (CAT). Assays for expressionproduced in conjunction with each of these reporter gene elements arewell-known to those skilled in the art.

The reporter gene can encode a detectable protein or an indirectlydetectable protein, or the reporter gene can be a survival gene. In apreferred embodiment, the reporter protein is a detectable protein. A“detectable protein” or “detection protein” (encoded by a detectable ordetection gene) is a protein that can be used as a direct label; thatis, the protein is detectable (and preferably, a cell comprising thedetectable protein is detectable) without further manipulation. Thus, inthis embodiment, the protein product of the reporter gene itself canserve to distinguish cells that are expressing the detectable gene. Inthis embodiment, suitable detectable genes include those encodingautofluorescent proteins.

As is known in the art, there are a variety of autofluorescent proteinsknown; these generally are based on the green fluorescent protein (GFP)from Aequorea and variants thereof; including, but not limited to, GFP,(Chalfie, et al. (1994) Science 263(5148):802-805); enhanced GFP (EGFP;Clontech—Genbank Accession Number U55762)), blue fluorescent protein(BFP; Quantum Biotechnologies, Inc., Montreal, Canada); Stauber (1998)Biotechniques 24(3):462-471; Heim and Tsien(1996) Curr. Biol.6:178-182), enhanced yellow fluorescent protein (EYFP; ClontechLaboratories, Inc., Palo Alto, Calif.) and red fluorescent protein. Inaddition, there are recent reports of autofluorescent proteins fromRenilla and Ptilosarcus species. See WO 92/15673; WO 95/07463; WO98/14605; WO 98/26277; WO 99/49019; U.S. Pat. No. 5,292,658; U.S. Pat.No. 5,418,155; U.S. Pat. No. 5,683,888; U.S. Pat. No. 5,741,668; U.S.Pat. No. 5,777,079; U.S. Pat. No. 5,804,387; U.S. Pat. No. 5,874,304;U.S. Pat. No. 5,876,995; and U.S. Pat. No. 5,925,558; all of which areexpressly incorporated herein by reference.

Isolation of Protein or Polypeptide of Interest

To measure the yield, solubility, conformation, and/or activity of theprotein of interest, it may be desirable to isolate the protein from oneor more strains in the array. The isolation may be a crude, semi-crude,or pure isolation, depending on the requirements of the assay used tomake the appropriate measurements. The protein may be produced in thecytoplasm, targeted to the periplasm, or may be secreted into theculture or fermentation media. To release proteins targeted to theperiplasm, treatments involving chemicals such as chloroform (Ames etal. (1984) J. Bacteriol., 160: 1181-1183), guanidine-HCl, and TritonX-100 (Naglak and Wang (1990) Enzyme Microb. Technol., 12: 603-611) havebeen used. However, these chemicals are not inert and may havedetrimental effects on many heterologous protein products or subsequentpurification procedures. Glycine treatment of E. coli cells, causingpermeabilization of the outer membrane, has also been reported torelease the periplasmic contents (Ariga et al. (1989) J. Ferm. Bioeng.,68: 243-246). The most widely used methods of periplasmic release ofheterologous protein are osmotic shock (Nosal and Heppel (1966) J. Biol.Chem., 241: 3055-3062; Neu and Heppel (1965) J. Biol. Chem., 240:3685-3692), hen Egg white (HEW)-lysozyme/ethylenediamine tetraaceticacid (EDTA) treatment (Neu and Heppel (1964) J. Biol. Chem., 239:3893-3900; Witholt et al. (1976) Biochim. Biophys. Acta, 443: 534-544;Pierce et al. (1995) ICheme Research. Event, 2: 995-997), and combinedHEW-lysozyme/osmotic shock treatment (French et al. (1996) Enzyme andMicrob. Tech., 19: 332-338). The French method involves resuspension ofthe cells in a fractionation buffer followed by recovery of theperiplasmic fraction, where osmotic shock immediately follows lysozymetreatment.

Typically, these procedures include an initial disruption inosmotically-stabilizing medium followed by selective release innon-stabilizing medium. The composition of these media (pH, protectiveagent) and the disruption methods used (chloroform, HEW-lysozyme, EDTA,sonication) vary among specific procedures reported. A variation on theHEW-lysozyme/EDTA treatment using a dipolar ionic detergent in place ofEDTA is discussed by Stabel et al. (1994) Veterinary Microbiol., 38:307-314. For a general review of use of intracellular lytic enzymesystems to disrupt E. coli, see Dabora and Cooney (1990) in Advances inBiochemical Engineering/Biotechnology, Vol. 43, A. Fiechter, ed.(Springer-Verlag: Berlin), pp. 11-30. Conventional methods for therecovery of proteins or polypeptides of interest from the cytoplasm, assoluble protein or refractile particles, involved disintegration of thebacterial cell by mechanical breakage. Mechanical disruption typicallyinvolves the generation of local cavitation in a liquid suspension,rapid agitation with rigid beads, sonication, or grinding of cellsuspension (Bacterial Cell Surface Techniques, Hancock and Poxton (JohnWiley & Sons Ltd, 1988), Chapter 3, p. 55).

HEW-lysozyme acts biochemically to hydrolyze the peptidoglycan backboneof the cell wall. The method was first developed by Zinder and Arndt(1956) Proc. Natl. Acad. Sci. USA, 42: 586-590, who treated E. coli withegg albumin (which contains HEW-lysozyme) to produce rounded cellularspheres later known as spheroplasts. These structures retained somecell-wall components but had large surface areas in which thecytoplasmic membrane was exposed. U.S. Pat. No. 5,169,772 discloses amethod for purifying heparinase from bacteria comprising disrupting theenvelope of the bacteria in an osmotically-stabilized medium, e.g., 20%sucrose solution using, e.g., EDTA, lysozyme, or an organic compound,releasing the non-heparinase-like proteins from the periplasmic space ofthe disrupted bacteria by exposing the bacteria to a low-ionic-strengthbuffer, and releasing the heparinase-like proteins by exposing thelow-ionic-strength-washed bacteria to a buffered salt solution.

Many different modifications of these methods have been used on a widerange of expression systems with varying degrees of success(Joseph-Liazun et al. (1990) Gene, 86: 291-295; Carter et al. (1992)Bio/Technology, 10: 163-167). Efforts to induce recombinant cell cultureto produce lysozyme have been reported. EP 0 155 189 discloses a meansfor inducing a recombinant cell culture to produce lysozymes, whichwould ordinarily be expected to kill such host cells by means ofdestroying or lysing the cell wall structure.

U.S. Pat. No. 4,595,658 discloses a method for facilitatingexternalization of proteins transported to the periplasmic space ofbacteria. This method allows selective isolation of proteins that locatein the periplasm without the need for lysozyme treatment, mechanicalgrinding, or osmotic shock treatment of cells. U.S. Pat. No. 4,637,980discloses producing a bacterial product by transforming atemperature-sensitive lysogen with a DNA molecule that codes, directlyor indirectly, for the product, culturing the transformant underpermissive conditions to express the gene product intracellularly, andexternalizing the product by raising the temperature to inducephage-encoded functions. Asami et al. (1997) J. Ferment. and Bioeng.,83: 511-516 discloses synchronized disruption of E. coli cells by T4phage infection, and Tanji et al. (1998) J. Ferment. and Bioeng., 85:74-78 discloses controlled expression of lysis genes encoded in T4 phagefor the gentle disruption of E. coli cells.

Upon cell lysis, genomic DNA leaks out of the cytoplasm into the mediumand results in significant increase in fluid viscosity that can impedethe sedimentation of solids in a centrifugal field. In the absence ofshear forces such as those exerted during mechanical disruption to breakdown the DNA polymers, the slower sedimentation rate of solids throughviscous fluid results in poor separation of solids and liquid duringcentrifugation. Other than mechanical shear force, there existnucleolytic enzymes that degrade DNA polymer. In E. coli, the endogenousgene endA encodes for an endonuclease (molecular weight of the matureprotein is approx. 24.5 kD) that is normally secreted to the periplasmand cleaves DNA into oligodeoxyribonucleotides in an endonucleolyticmanner. It has been suggested that endA is relatively weakly expressedby E. coli (Wackemagel et al. (1995) Gene 154: 55-59).

If desired, the proteins produced using one or more strains in the arrayof this invention may be isolated and purified to substantial purity bystandard techniques well known in the art, including, but not limitedto, ammonium sulfate or ethanol precipitation, acid extraction, anion orcation exchange chromatography, phosphocellulose chromatography,hydrophobic interaction chromatography, affinity chromatography, nickelchromatography, hydroxylapatite chromatography, reverse phasechromatography, lectin chromatography, preparative electrophoresis,detergent solubilization, selective precipitation with such substancesas column chromatography, immunopurification methods, and others. Forexample, proteins having established molecular adhesion properties canbe reversibly fused with a ligand. With the appropriate ligand, theprotein can be selectively adsorbed to a purification column and thenfreed from the column in a relatively pure form. The fused protein isthen removed by enzymatic activity. In addition, protein can be purifiedusing immunoaffinity columns or Ni-NTA columns. General techniques arefurther described in, for example, R. Scopes, Protein Purification:Principles and Practice, Springer-Verlag: N.Y. (1982); Deutscher, Guideto Protein Purification, Academic Press (1990); U.S. Pat. No. 4,511,503;S. Roe, Protein Purification Techniques: A Practical Approach (PracticalApproach Series), Oxford Press (2001); D. Bollag, et al., ProteinMethods, Wiley-Lisa, Inc. (1996); A K Patra et al., Protein Expr Purif,18(2): p/182-92 (2000); and R. Mukhija, et al., Gene 165(2): p. 303-6(1995). See also, for example, Ausubel, et al. (1987 and periodicsupplements); Deutscher (1990) “Guide to Protein Purification,” Methodsin Enzymology vol. 182, and other volumes in this series; Coligan, etal. (1996 and periodic Supplements) Current Protocols in Protein ScienceWiley/Greene, NY; and manufacturer's literature on use of proteinpurification products, e.g., Pharmacia, Piscataway, N.J., or Bio-Rad,Richmond, Calif. Combination with recombinant techniques allow fusion toappropriate segments, e.g., to a FLAG sequence or an equivalent whichcan be fused via a protease-removable sequence. See also, for example.,Hochuli (1989) Chemische Industrie 12:69-70; Hochuli (1990)“Purification of Recombinant Proteins with Metal Chelate Absorbent” inSetlow (ed.) Genetic Engineering, Principle and Methods 12:87-98, PlenumPress, NY; and Crowe, et al. (1992) QIAexpress: The High LevelExpression & Protein Purification System QUIAGEN, Inc., Chatsworth,Calif.

Detection of the expressed protein is achieved by methods known in theart and include, for example, radioimmunoassays, Western blottingtechniques or immunoprecipitation.

Certain proteins expressed by the strains in the array of this inventionmay form insoluble aggregates (“inclusion bodies”). Several protocolsare suitable for purification of proteins from inclusion bodies. Forexample, purification of inclusion bodies typically involves theextraction, separation and/or purification of inclusion bodies bydisruption of the host cells, e.g., by incubation in a buffer of 50 mMTRIS/HCL pH 7.5, 50 mM NaCl, 5 mM MgCl₂, 1 mM DTT, 0.1 mM ATP, and 1 mMPMSF. The cell suspension is typically lysed using 2-3 passages througha French Press. The cell suspension can also be homogenized using aPolytron (Brinkman Instruments) or sonicated on ice. Alternate methodsof lysing bacteria are apparent to those of skill in the art (see, e.g.,Sambrook et al., supra; Ausubel et al., supra).

If necessary, the inclusion bodies can be solubilized, and the lysedcell suspension typically can be centrifuged to remove unwantedinsoluble matter. Proteins that formed the inclusion bodies may berenatured by dilution or dialysis with a compatible buffer. Suitablesolvents include, but are not limited to urea (from about 4 M to about 8M), formamide (at least about 80%, volume/volume basis), and guanidinehydrochloride (from about 4 M to about 8 M). Although guanidinehydrochloride and similar agents are denaturants, this denaturation isnot irreversible and renaturation may occur upon removal (by dialysis,for example) or dilution of the denaturant, allowing re-formation ofimmunologically and/or biologically active protein. Other suitablebuffers are known to those skilled in the art.

The heterologously-expressed proteins present in the supernatant can beseparated from the host proteins by standard separation techniques wellknown to those of skill in the art. For example, an initial saltfractionation can separate many of the unwanted host cell proteins (orproteins derived from the cell culture media) from the protein orpolypeptide of interest. One such example can be ammonium sulfate.Ammonium sulfate precipitates proteins by effectively reducing theamount of water in the protein mixture. Proteins then precipitate on thebasis of their solubility. The more hydrophobic a protein is, the morelikely it is to precipitate at lower ammonium sulfate concentrations. Atypical protocol includes adding saturated ammonium sulfate to a proteinsolution so that the resultant ammonium sulfate concentration is between20-30%. This concentration will precipitate the most hydrophobic ofproteins. The precipitate is then discarded (unless the protein ofinterest is hydrophobic) and ammonium sulfate is added to thesupernatant to a concentration known to precipitate the protein ofinterest. The precipitate is then solubilized in buffer and the excesssalt removed if necessary, either through dialysis or diafiltration.Other methods that rely on solubility of proteins, such as cold ethanolprecipitation, are well known to those of skill in the art and can beused to fractionate complex protein mixtures.

The molecular weight of a protein or polypeptide of interest can be usedto isolated it from proteins of greater and lesser size usingultrafiltration through membranes of different pore size (for example,Amicon or Millipore membranes). As a first step, the protein mixture canbe ultrafiltered through a membrane with a pore size that has a lowermolecular weight cut-off than the molecular weight of the protein ofinterest. The retentate of the ultrafiltration can then be ultrafilteredagainst a membrane with a molecular cut off greater than the molecularweight of the protein of interest. The protein or polypeptide ofinterest will pass through the membrane into the filtrate. The filtratecan then be chromatographed as described below.

The expressed proteins or polypeptides of interest can also be separatedfrom other proteins on the basis of its size, net surface charge,hydrophobicity, and affinity for ligands. In addition, antibodies raisedagainst proteins can be conjugated to column matrices and the proteinsimmunopurified. All of these methods are well known in the art. It willbe apparent to one of skill that chromatographic techniques can beperformed at any scale and using equipment from many differentmanufacturers (e.g., Pharmacia Biotech).

Renaturation and Refolding

Where heterologously expressed protein is produced in a denatured form,insoluble protein can be renatured or refolded to generate secondary andtertiary protein structure conformation. Protein refolding steps can beused, as necessary, in completing configuration of the heterologousproduct. Refolding and renaturation can be accomplished using an agentthat is known in the art to promote dissociation/association ofproteins. For example, the protein can be incubated with dithiothreitolfollowed by incubation with oxidized glutathione disodium salt followedby incubation with a buffer containing a refolding agent such as urea.

The protein or polypeptide of interest can also be renatured, forexample, by dialyzing it against phosphate-buffered saline (PBS) or 50mM Na-acetate, pH 6 buffer plus 200 mM NaCl. Alternatively, the proteincan be refolded while immobilized on a column, such as the Ni NTA columnby using a linear 6M-1M urea gradient in 500 mM NaCl, 20% glycerol, 20mM Tris/HCl pH 7.4, containing protease inhibitors. The renaturation canbe performed over a period of 1.5 hours or more. After renaturation theproteins can be eluted by the addition of 250 mM imidazole. Imidazolecan be removed by a final dialyzing step against PBS or 50 mM sodiumacetate pH 6 buffer plus 200 mM NaCl. The purified protein can be storedat 4° C. or frozen at −80° C.

Other methods include, for example, those that may be described in M HLee et al., Protein Expr. Purif., 25(1): p. 166-73 (2002), W. K. Cho etal., J. Biotechnology, 77(2-3): p. 169-78 (2000), Ausubel, et al. (1987and periodic supplements), Deutscher (1990) “Guide to ProteinPurification,” Methods in Enzymology vol. 182, and other volumes in thisseries, Coligan, et al. (1996 and periodic Supplements) CurrentProtocols in Protein Science Wiley/Greene, NY, S. Roe, ProteinPurification Techniques: A Practical Approach (Practical ApproachSeries), Oxford Press (2001); D. Bollag, et al., Protein Methods,Wiley-Lisa, Inc. (1996)

Expression Vectors

A heterologous protein of interest can be produced in one or more of thehost cells disclosed herein by introducing into each strain anexpression vector encoding the heterologous protein of interest. In oneembodiment, the vector comprises a polynucleotide sequence encoding theprotein of interest operably linked to a promoter capable of functioningin the chosen host cell, as well as all other required transcription andtranslation regulatory elements.

The term “operably linked” refers to any configuration in which thetranscriptional and any translational regulatory elements are covalentlyattached to the encoding sequence in such disposition(s), relative tothe coding sequence, that in and by action of the host cell, theregulatory elements can direct the expression of the coding sequence.

The heterologous protein of interest can be expressed frompolynucleotides in which the heterologous polypeptide coding sequence isoperably linked to transcription and translation regulatory elements toform a functional gene from which the host cell can express the proteinor polypeptide. The coding sequence can be a native coding sequence forthe heterologous polypeptide, or may be a coding sequence that has beenselected, improved, or optimized for use in the selected expression hostcell: for example, by synthesizing the gene to reflect the codon usebias of a host species. In one embodiment of the invention, the hostspecies is a P. fluorescens, and the codon bias of P. fluorescens istaken into account when designing the polypeptide coding sequence. Thegene(s) are constructed within or inserted into one or more vector(s),which can then be transformed into the expression host cell.

Other regulatory elements may be included in a vector (also termed“expression construct”). The vector will typically comprise one or morephenotypic selectable markers and an origin of replication to ensuremaintenance of the vector and to, if desirable, provide amplificationwithin the host. Additional elements include, but are not limited to,for example, transcriptional enhancer sequences, translational enhancersequences, other promoters, activators, translational start and stopsignals, transcription terminators, cistronic regulators, polycistronicregulators, or tag sequences, such as nucleotide sequence “tags” and“tag” polypeptide coding sequences, which facilitates identification,separation, purification, and/or isolation of an expressed polypeptide.

In another embodiment, the expression vector further comprises a tagsequence adjacent to the coding sequence for the protein or polypeptideof interest. In one embodiment, this tag sequence allows forpurification of the protein. The tag sequence can be an affinity tag,such as a hexa-histidine affinity tag (SEQ ID NO: 158). In anotherembodiment, the affinity tag can be a glutathione-5-transferasemolecule. The tag can also be a fluorescent molecule, such as YFP orGFP, or analogs of such fluorescent proteins. The tag can also be aportion of an antibody molecule, or a known antigen or ligand for aknown binding partner useful for purification.

A protein-encoding gene according to the present invention can include,in addition to the protein coding sequence, the following regulatoryelements operably linked thereto: a promoter, a ribosome binding site(RBS), a transcription terminator, translational start and stop signals.Useful RBSs can be obtained from any of the species useful as host cellsin expression systems according to the present invention, preferablyfrom the selected host cell. Many specific and a variety of consensusRBSs are known, e.g., those described in and referenced by D. Frishmanet al., Gene 234(2):257-65 (8 Jul. 1999); and B. E. Suzek et al.,Bioinformatics 17(12):1123-30 (December 2001). In addition, eithernative or synthetic RBSs may be used, e.g., those described in: EP0207459 (synthetic RBSs); O. Ikehata et al., Eur. J. Biochem.181(3):563-70 (1989) (native RBS sequence of AAGGAAG). Further examplesof methods, vectors, and translation and transcription elements, andother elements useful in the present invention are described in, e.g.:U.S. Pat. No. 5,055,294 to Gilroy and U.S. Pat. No. 5,128,130 to Gilroyet al.; U.S. Pat. No. 5,281,532 to Rammler et al.; U.S. Pat. Nos.4,695,455 and 4,861,595 to Barnes et al.; U.S. Pat. No. 4,755,465 toGray et al.; and U.S. Pat. No. 5,169,760 to Wilcox.

Transcription of the DNA encoding the heterologous protein of interestis increased by inserting an enhancer sequence into the vector orplasmid. Typical enhancers are cis-acting elements of DNA, usually aboutfrom 10 to 300 by in size that act on the promoter to increase itstranscription. Examples include various Pseudomonas enhancers.

Generally, the heterologous expression vectors will include origins ofreplication and selectable markers permitting transformation of the hostcell and a promoter derived from a highly-expressed gene to directtranscription of a downstream structural sequence. Such promoters can bederived from operons encoding the enzymes such as 3-phosphoglyceratekinase (PGK), acid phosphatase, or heat shock proteins, among others.Where signal sequences are used, the heterologous coding sequence isassembled in appropriate phase with translation initiation andtermination sequences, and the signal sequence capable of directingcompartmental accumulation or secretion of the translated protein.Optionally the heterologous sequence can encode a fusion enzymeincluding an N-terminal identification polypeptide imparting desiredcharacteristics, e.g., stabilization or simplified purification ofexpressed heterologous product. The fusion polypeptide can also compriseone or more target proteins or inhibitors or enhances thereof, asdiscussed supra.

Vectors are known in the art for expressing heterologous proteins inhost cells, and any of these may be used for expressing the genesaccording to the present invention. Such vectors include, e.g.,plasmids, cosmids, and phage expression vectors. Examples of usefulplasmid vectors include, but are not limited to, the expression plasmidspBBR1MCS, pDSK519, pKT240, pML122, pPS10, RK2, RK6, pRO1600, andRSF1010. Other examples of such useful vectors include those describedby, e.g.: N. Hayase, in Appl. Envir. Microbiol. 60(9):3336-42 (September1994); A. A. Lushnikov et al., in Basic Life Sci. 30:657-62 (1985); S.Graupner & W. Wackemagel, in Biomolec. Eng. 17(1):11-16. (October 2000);H. P. Schweizer, in Curr. Opin. Biotech. 12(5):439-45 (October 2001); M.Bagdasarian & K. N. Timmis, in Curr. Topics Microbiol. Immunol. 96:47-67(1982); T. Ishii et al., in FEMS Microbiol. Lett. 116(3):307-13 (Mar. 1,1994); I. N. Olekhnovich & Y. K. Fomichev, in Gene 140(1):63-65 (Mar.11, 1994); M. Tsuda & T. Nakazawa, in Gene 136(1-2):257-62 (Dec. 22,1993); C. Nieto et al., in Gene 87(1):145-49 (Mar. 1, 1990); J. D. Jones& N. Gutterson, in Gene 61(3):299-306 (1987); M. Bagdasarian et al., inGene 16(1-3):237-47 (December 1981); H. P. Schweizer et al., in Genet.Eng. (NY) 23:69-81 (2001); P. Mukhopadhyay et al., in J. Bact.172(1):477-80 (January 1990); D. O. Wood et al., in J. Bact.145(3):1448-51 (March 1981); and R. Holtwick et al., in Microbiology147(Pt 2):337-44 (February 2001).

Further examples of expression vectors that can be useful in a host cellof the invention include those listed in Table 5 as derived from theindicated replicons.

TABLE 5 Examples of Useful Expression Vectors Replicon Vector(s) PPS10PCN39, PCN51 RSF1010 PKT261-3 PMMB66EH PEB8 PPLGN1 PMYC1050 RK2/RP1PRK415 PJB653 PRO1600 PUCP PBSP

The expression plasmid, RSF1010, is described, e.g., by F. Heffron etal., in Proc. Nat'l Acad. Sci. USA 72(9):3623-27 (September 1975), andby K. Nagahari & K. Sakaguchi, in J. Bact. 133(3):1527-29 (March 1978).Plasmid RSF1010 and derivatives thereof are particularly useful vectorsin the present invention. Exemplary useful derivatives of RSF1010, whichare known in the art, include, e.g., pKT212, pKT214, pKT231 and relatedplasmids, and pMYC 1050 and related plasmids (see, e.g., U.S. Pat. Nos.5,527,883 and 5,840,554 to Thompson et al.), such as, e.g., pMYC1803.Plasmid pMYC1803 is derived from the RSF1010-based plasmid pTJS260 (seeU.S. Pat. No. 5,169,760 to Wilcox), which carries a regulatedtetracycline resistance marker and the replication and mobilization locifrom the RSF1010 plasmid. Other exemplary useful vectors include thosedescribed in U.S. Pat. No. 4,680,264 to Puhler et al.

In one embodiment, an expression plasmid is used as the expressionvector. In another embodiment, RSF1010 or a derivative thereof is usedas the expression vector. In still another embodiment, pMYC1050 or aderivative thereof, or pMYC4803 or a derivative thereof, is used as theexpression vector.

The plasmid can be maintained in the host cell by inclusion of aselection marker gene in the plasmid. This may be an antibioticresistance gene(s), where the corresponding antibiotic(s) is added tothe fermentation medium, or any other type of selection marker geneknown in the art, e.g., a prototrophy-restoring gene where the plasmidis used in a host cell that is auxotrophic for the corresponding trait,e.g., a biocatalytic trait such as an amino acid biosynthesis or anucleotide biosynthesis trait, or a carbon source utilization trait.

The promoters used in accordance with the present invention may beconstitutive promoters or regulated promoters. Common examples of usefulregulated promoters include those of the family derived from the lacpromoter (i.e. the lacZ promoter), especially the tac and trc promotersdescribed in U.S. Pat. No. 4,551,433 to DeBoer, as well as Ptac16,Ptac17, PtacII, PlacUV5, and the T7lac promoter. In one embodiment, thepromoter is not derived from the host cell organism. In certainembodiments, the promoter is derived from an E. coli organism.

Common examples of non-lac-type promoters useful in expression systemsaccording to the present invention include, e.g., those listed in Table6.

TABLE 6 Examples of non-lac Promoters Promoter Inducer P_(R) Hightemperature P_(L) High temperature Pm Alkyl- or halo-benzoates Pu Alkyl-or halo-toluenes Psal Salicylates

See, e.g.: J. Sanchez-Romero & V. De Lorenzo (1999) Manual of IndustrialMicrobiology and Biotechnology (A. Demain & J. Davies, eds.) pp. 460-74(ASM Press, Washington, D.C.); H. Schweizer (2001) Current Opinion inBiotechnology, 12:439-445; and R. Slater & R. Williams (2000 MolecularBiology and Biotechnology (J. Walker & R. Rapley, eds.) pp. 125-54 (TheRoyal Society of Chemistry, Cambridge, UK)). A promoter having thenucleotide sequence of a promoter native to the selected bacterial hostcell may also be used to control expression of the transgene encodingthe target polypeptide, e.g, a Pseudomonas anthranilate or benzoateoperon promoter (Pant, Pben). Tandem promoters may also be used in whichmore than one promoter is covalently attached to another, whether thesame or different in sequence, e.g., a Pant-Pben tandem promoter(interpromoter hybrid) or a Plac-Plac tandem promoter, or whetherderived from the same or different organisms.

Regulated promoters utilize promoter regulatory proteins in order tocontrol transcription of the gene of which the promoter is a part. Wherea regulated promoter is used herein, a corresponding promoter regulatoryprotein will also be part of an expression system according to thepresent invention. Examples of promoter regulatory proteins include:activator proteins, e.g., E. coli catabolite activator protein, MalTprotein; AraC family transcriptional activators; repressor proteins,e.g., E. coli Lad proteins; and dual-function regulatory proteins, e.g.,E. coli NagC protein. Manyregulated-promoter/promoter-regulatory-protein pairs are known in theart. In one embodiment, the expression construct for the targetprotein(s) and the heterologous protein of interest are under thecontrol of the same regulatory element.

Promoter regulatory proteins interact with an effector compound, i.e. acompound that reversibly or irreversibly associates with the regulatoryprotein so as to enable the protein to either release or bind to atleast one DNA transcription regulatory region of the gene that is underthe control of the promoter, thereby permitting or blocking the actionof a transcriptase enzyme in initiating transcription of the gene.Effector compounds are classified as either inducers or co-repressors,and these compounds include native effector compounds and gratuitousinducer compounds. Manyregulated-promoter/promoter-regulatory-protein/effector-compound triosare known in the art. Although an effector compound can be usedthroughout the cell culture or fermentation, in a preferred embodimentin which a regulated promoter is used, after growth of a desiredquantity or density of host cell biomass, an appropriate effectorcompound is added to the culture to directly or indirectly result inexpression of the desired gene(s) encoding the protein or polypeptide ofinterest.

By way of example, where a lac family promoter is utilized, a lad genecan also be present in the system. The lad gene, which is (normally) aconstitutively expressed gene, encodes the Lac repressor protein (LacDprotein) which binds to the lac operator of these promoters. Thus, wherea lac family promoter is utilized, the lad gene can also be included andexpressed in the expression system. In the case of the lac promoterfamily members, e.g., the tac promoter, the effector compound is aninducer, preferably a gratuitous inducer such as IPTG(isopropyl-D-1-thiogalactopyranoside, also called“isopropylthiogalactoside”).

For expression of a protein or polypeptide of interest, any plantpromoter may also be used. A promoter may be a plant RNA polymerase IIpromoter. Elements included in plant promoters can be a TATA box orGoldberg-Hogness box, typically positioned approximately 25 to 35basepairs upstream (5′) of the transcription initiation site, and theCCAAT box, located between 70 and 100 basepairs upstream. In plants, theCCAAT box may have a different consensus sequence than the functionallyanalogous sequence of mammalian promoters (Messing et al. (1983) In:Genetic Engineering of Plants, Kosuge et al., eds., pp. 211-227). Inaddition, virtually all promoters include additional upstream activatingsequences or enhancers (Benoist and Chambon (1981) Nature 290:304-310;Gruss et al. (1981) Proc. Nat. Acad. Sci. 78:943-947; and Khoury andGruss (1983) Cell 27:313-314) extending from around −100 by to −1,000 byor more upstream of the transcription initiation site.

Expression Systems

It may be desirable to target the protein or polypeptide of interest tothe periplasm of one or more of the populations of host cells in thearray, or into the extracellular space. In one embodiment, theexpression vector further comprises a nucleotide sequence encoding asecretion signal sequence polypeptide operably linked to the nucleotidesequence encoding the protein or polypeptide of interest. In someembodiments, no modifications are made between the signal sequence andthe protein or polypeptide of interest. However, in certain embodiments,additional cleavage signals are incorporated to promote properprocessing of the amino terminal of the polypeptide.

The vector can have any of the characteristics described above. In oneembodiment, the vector comprising the coding sequence for the protein orpolypeptide of interest further comprises a signal sequence, e.g., asecretion signal sequence.

Therefore, in one embodiment, this isolated polypeptide is a fusionprotein of the secretion signal and a protein or polypeptide ofinterest. However, the secretion signal can also be cleaved from theprotein when the protein is targeted to the periplasm. In oneembodiment, the linkage between the Sec system secretion signal and theprotein or polypeptide is modified to increase cleavage of the secretionsignal.

Secretion signals useful in the compositions and methods of the presentinvention are known in the art and are provided herein and in U.S. Pat.App. Pub. Nos. 2006/0008877 and 2008/0193974, both incorporated hereinby reference in there entirety. These sequences can promote thetargeting of an operably linked polypeptide of interest to the periplasmof Gram-negative bacteria or into the extracellular environment. Use ofsecretion signal leader sequences can increase production of recombinantproteins in bacteria that produce improperly folded, aggregated orinactive proteins. Additionally, many types of proteins requiresecondary modifications that are inefficiently achieved using knownmethods. Secretion leader utilization can increase the harvest ofproperly folded proteins by secreting the protein from the intracellularenvironment. In Gram-negative bacteria, a protein secreted from thecytoplasm can end up in the periplasmic space, attached to the outermembrane, or in the extracellular broth. These methods also avoidformation of inclusion bodies, which constitute aggregated proteins.Secretion of proteins into the periplasmic space also has the well-knowneffect of facilitating proper disulfide bond formation (Bardwell et al.(1994) Phosphate Microorg. 270-5; Manoil (2000) Methods in Enzymol. 326:35-47). Other benefits of secretion of recombinant protein include: moreefficient isolation of the protein; proper folding and disulfide bondformation of the transgenic protein, leading to an increase in yieldrepresented by, e.g., the percentage of the protein in active form,reduced formation of inclusion bodies and reduced toxicity to the hostcell, and an increased percentage of the recombinant protein in solubleform. The potential for excretion of the protein of interest into theculture medium can also potentially promote continuous, rather thanbatch culture for protein production.

Certain secretion leader sequences useful in the compositions andmethods of the present invention are shown in Table 7 below. Asunderstood by those of skill in the art, these sequences and othersdescribed in the art can retain function or have improved function whenamino acid changes are made. Furthermore, it is understood that thenucleic acid sequences encoding these leaders can in come cases varywithout effect on the function of the leader. Additional leadersequences are provided in the sequence listings.

TABLE 7 Exemplary Leader Sequences SEQ ID Leader Sequence Abbrev AminoAcid Sequence NO Porin E1 PorE MKKSTLAVAVTLGAIAQQAGA Outer membrane OprFMKLKNTLGLAIGSLIAATSFGVLA porin F Periplasmic PbpMKLKRLMAAMTFVAAGVATANAVA phosphate binding protein Azurin AzuMFAKLVAVSLLTLASGQLLA Lipoprotein B Lip MIKRNLLVMGLAVLLSA RXF04720Lysine-arginine- Lao MQNYKKFLLAAAVSMAFSATAMA ornithine-binding proteinIron(III) binding Ibp MIRDNRLKTSLLRGLTLTLLSLTLLSPAAHS protein PB signalsequence Pbp- MKLKRLMAAMTFVAAGVATVNAVA mutant A20V DsbA DsbAMRNLILSAALVTASLFGMTAQA DsbC DsbC MRLTQIIAAAAIALVSTFALA tolB tolBMRNLLRGMLVVICCMAGIAAA (PCR amplified from MB214 genomic)Tetratricopeptide tpr MNRSSALLLAFVFLSGCQAMA repeat family proteinMethyl-accepting MSLRNMNIAPRAFLGFAFIGALMLLLGVFALNQMSKIRA chemotaxisprotein Toluene tolerance ttg2C MQNRTVEIGVGLFLLAGILALLLLALRVSGLSAprotein ttg2C FlgI FlgI MKFKQLMAMALLLALSAVAQA RXF05262 EcpD, CupC2 cupC2MPPRSIAACLGLLGLLMATQAAA bacterial pili assembly chaperone RXF04554 EcpD,CupB2 cupB2 MLFRTLLASLTFAVIAGLPSTAHA RXF05310 EcpD, CupA2 cupA2MSCTRAFKPLLLIGLATLMCSHAFA RXF04296 NikA nikA MRLAALPLLLAPLFIAPMAVAPeriplasmic dipeptide transport protein RXF08966 Bce bceMSTRIPRRQWLKGASGLLAAASLGRLANREARA (Bacillus coagulans) IBP S31AMIRDNRLKTSLLRGLTLTLLSLTLLSPAAHA

In embodiments, the expression vector contains an optimal ribosomebinding sequence. Modulating translation strength by altering thetranslation initiation region of a protein of interest can be used toimprove the production of heterologous cytoplasmic proteins thataccumulate mainly as inclusion bodies due to a translation rate that istoo rapid. Secretion of heterologous proteins into the periplasmic spaceof bacterial cells can also be enhanced by optimizing rather thanmaximizing protein translation levels such that the translation rate isin sync with the protein secretion rate.

The translation initiation region has been defined as the sequenceextending immediately upstream of the ribosomal binding site (RBS) toapproximately 20 nucleotides downstream of the initiation codon(McCarthy et al. (1990) Trends in Genetics 6:78-85, herein incorporatedby reference in its entirety). In prokaryotes, alternative RBS sequencescan be utilized to optimize translation levels of heterologous proteinsby providing translation rates that are decreased with respect to thetranslation levels using the canonical, or consensus, RBS sequence(AGGAGG; SEQ ID NO:1) described by Shine and Dalgarno (Proc. Natl. Acad.Sci. USA 71:1342-1346, 1974). By “translation rate” or “translationefficiency” is intended the rate of mRNA translation into proteinswithin cells. In most prokaryotes, the Shine-Dalgarno sequence assistswith the binding and positioning of the 30S ribosome component relativeto the start codon on the mRNA through interaction with apyrimidine-rich region of the 16S ribosomal RNA. The RBS (also referredto herein as the Shine-Dalgarno sequence) is located on the mRNAdownstream from the start of transcription and upstream from the startof translation, typically from 4 to 14 nucleotides upstream of the startcodon, and more typically from 8 to 10 nucleotides upstream of the startcodon. Because of the role of the RBS sequence in translation, there isa direct relationship between the efficiency of translation and theefficiency (or strength) of the RBS sequence.

In some embodiments, modification of the RBS sequence results in adecrease in the translation rate of the heterologous protein. Thisdecrease in translation rate may correspond to an increase in the levelof properly processed protein or polypeptide per gram of proteinproduced, or per gram of host protein. The decreased translation ratecan also correlate with an increased level of recoverable protein orpolypeptide produced per gram of recombinant or per gram of host cellprotein. The decreased translation rate can also correspond to anycombination of an increased expression, increased activity, increasedsolubility, or increased translocation (e.g., to a periplasmiccompartment or secreted into the extracellular space). In thisembodiment, the term “increased” is relative to the level of protein orpolypeptide that is produced, properly processed, soluble, and/orrecoverable when the protein or polypeptide of interest is expressedunder the same conditions, and wherein the nucleotide sequence encodingthe polypeptide comprises the canonical RBS sequence. Similarly, theterm “decreased” is relative to the translation rate of the protein orpolypeptide of interest wherein the gene encoding the protein orpolypeptide comprises the canonical RBS sequence. The translation ratecan be decreased by at least about 5%, at least about 10%, at leastabout 15%, at least about 20%, about 25%, about 30%, about 35%, about40%, about 45%, about 50%, about 55%, about 60%, about 65%, about 70, atleast about 75% or more, or at least about 2-fold, about 3-fold, about4-fold, about 5-fold, about 6-fold, about 7-fold, or greater.

In some embodiments, the RBS sequence variants described herein can beclassified as resulting in high, medium, or low translation efficiency.In one embodiment, the sequences are ranked according to the level oftranslational activity compared to translational activity of thecanonical RBS sequence. A high RBS sequence has about 60% to about 100%of the activity of the canonical sequence. A medium RBS sequence hasabout 40% to about 60% of the activity of the canonical sequence. A lowRBS sequence has less than about 40% of the activity of the canonicalsequence.

Examples of RBS sequences are shown in Table 8. The sequences werescreened for translational strength using COP-GFP as a reporter gene andranked according to percentage of consensus RBS fluorescence. Each RBSvariant was placed into one of three general fluorescence ranks: High(“Hi”-100% Consensus RBS fluorescence), Medium (“Med”-46-51% ofConsensus RBS fluorescence), and Low (“Lo”-16-29% Consensus RBSfluorescence).

TABLE 8 RBS Sequences Consensus AGGAGG High RBS2 GGAGCG Med RBS34 GGAGCGMed RBS41 AGGAGT Med RBS43 GGAGTG Med RBS48 GAGTAA Low RBS1 AGAGAG LowRBS35 AAGGCA Low RBS49 CCGAAC Low

Methods for identifying optimal ribosome binding sites are described inU.S. Pat. App. No. 2009/062143, “Translation initiation region sequencesfor optimal expression of heterologous proteins,” incorporated herein byreference in its entirety.

One or more genes encoding heterologous proteins can be expressed fromthe same expression vector, as desired. For example, one might choose toexpress an antibody heavy chain and light chain from the same vector.The same promoter and regulatory sequences can be used to driveexpression of both genes (e.g., in tandem), or the genes can beexpressed separately on the same expression vector. In embodiments ofthe invention, at least two genes are encoded on separate expressionvectors within the same expression system. The at least two genes arerelated or unrelated.

In the context of the array, it can be convenient and informative totest the expression of a group of heterologous proteins in parallel inthe same array. This can be accomplished by providing several series ofexpression systems. One series contains expression vectors encoding atleast one heterologous protein to be compared with at least one otherheterologous protein in another series of expression systems. Forexample, a group of variants of the same protein can be tested on thesame array in several series of expression systems. In each series ofexpression systems, the expression vector encodes the same variant. Suchan approach could also be useful for testing a library of bindingproteins, e.g., antibodies. In embodiments, the proteins tested inparallel are related; in others, they are not.

Prior to cloning into an expression vector, the protein coding sequencecan be optimized if desired. The sequence is cloned into a series ofexpression vectors containing, e.g., secretion leader sequences andother appropriate promoters or regulatory sequences as described herein.These sequence elements can be selected based on an analysis of theheterologous protein amino acid sequence as described herein.

The CHAMPION™ pET expression system provides a high level of proteinproduction. Expression is induced from the strong T7lac promoter. Thissystem takes advantage of the high activity and specificity of thebacteriophage T7 RNA polymerase for high level transcription of the geneof interest. The lac operator located in the promoter region providestighter regulation than traditional T7-based vectors, improving plasmidstability and cell viability (Studier and Moffatt (1986) J MolecularBiology 189(1): 113-30; Rosenberg, et al. (1987) Gene 56(1): 125-35).The T7 expression system uses the T7 promoter and T7 RNA polymerase (T7RNAP) for high-level transcription of the gene of interest. High-levelexpression is achieved in T7 expression systems because the T7 RNAP ismore processive than native E. coli RNAP and is dedicated to thetranscription of the gene of interest. Expression of the identified geneis induced by providing a source of T7 RNAP in the host cell. This isaccomplished by using a BL21 E. coli host containing a chromosomal copyof the T7 RNAP gene. The T7 RNAP gene is under the control of the lacUV5promoter which can be induced by IPTG. T7 RNAP is expressed uponinduction and transcribes the gene of interest.

The pBAD expression system allows tightly controlled, titratableexpression of protein or polypeptide of interest through the presence ofspecific carbon sources such as glucose, glycerol and arabinose (Guzman,et al. (1995) J Bacteriology 177(14): 4121-30). The pBAD vectors areuniquely designed to give precise control over expression levels.Heterologous gene expression from the pBAD vectors is initiated at thearaBAD promoter. The promoter is both positively and negativelyregulated by the product of the araC gene. AraC is a transcriptionalregulator that forms a complex with L-arabinose. In the absence ofL-arabinose, the AraC dimer blocks transcription. For maximumtranscriptional activation two events are required: (i) L-arabinosebinds to AraC allowing transcription to begin, and, (ii) The cAMPactivator protein (CAP)-cAMP complex binds to the DNA and stimulatesbinding of AraC to the correct location of the promoter region.

The trc expression system allows high-level, regulated expression in E.coli from the trc promoter. The trc expression vectors have beenoptimized for expression of eukaryotic genes in E. coli. The trcpromoter is a strong hybrid promoter derived from the tryptophane (trp)and lactose (lac) promoters. It is regulated by the lacO operator andthe product of the lacIQ gene (Brosius, J. (1984) Gene 27(2): 161-72).

Transformation of the host cells with the vector(s) disclosed herein maybe performed using any transformation methodology known in the art, andthe bacterial host cells may be transformed as intact cells or asprotoplasts (i.e. including cytoplasts). Exemplary transformationmethodologies include poration methodologies, e.g., electroporation,protoplast fusion, bacterial conjugation, and divalent cation treatment,e.g., calcium chloride treatment or CaCl/Mg2+ treatment, or other wellknown methods in the art. See, e.g., Morrison, J. Bact., 132:349-351(1977); Clark-Curtiss & Curtiss, Methods in Enzymology, 101:347-362 (Wuet al., eds, 1983), Sambrook et al., Molecular Cloning, A LaboratoryManual (2nd ed. 1989); Kriegler, Gene Transfer and Expression: ALaboratory Manual (1990); and Current Protocols in Molecular Biology(Ausubel et al., eds., 1994)).

Proteins of Interest

The methods and compositions of the present invention are useful foridentifying a P. fluorescens strain that is optimal for producing highlevels of a properly processed protein or polypeptide of interest. Thearrays are useful for screening for production of a protein orpolypeptide of interest of any species and of any size. However, incertain embodiments, the protein or polypeptide of interest is atherapeutically useful protein or polypeptide. In some embodiments, theprotein can be a mammalian protein, for example a human protein, and canbe, for example, a growth factor, a cytokine, a chemokine or a bloodprotein. The protein or polypeptide of interest can be processed in asimilar manner to the native protein or polypeptide. In certainembodiments, the protein or polypeptide of interest is less than 100 kD,less than 50 kD, or less than 30 kD in size. In certain embodiments, theprotein or polypeptide of interest is a polypeptide of at least about 5,10, 15, 20, 30, 40, 50 or 100 or more amino acids.

The coding sequence for the protein or polypeptide of interest can be anative coding sequence for the polypeptide, if available, but will morepreferably be a coding sequence that has been selected, improved, oroptimized for use in an expressible form in the strains of the array:for example, by optimizing the gene to reflect the codon use bias of aPseudomonas species such as P. fluorescens or other suitable organism.For gene optimization, one or more rare codons may be removed to avoidribosomal stalling and minimize amino acid misincorporation. One or moregene-internal ribosome binding sites may also be eliminated to avoidtruncated protein products. Long stretches of C and G nucleotides may beremoved to avoid RNA polymerase slippage that could result inframe-shifts. Strong gene-internal stem-loop structures, especially theones covering the ribosome binding site, may also be eliminated.

In other embodiments, the protein when produced also includes anadditional targeting sequence, for example a sequence that targets theprotein to the periplasm or the extracellular medium. In one embodiment,the additional targeting sequence is operably linked to thecarboxy-terminus of the protein. In another embodiment, the proteinincludes a secretion signal for an autotransporter, a two partnersecretion system, a main terminal branch system or a fimbrial usherporin.

The gene(s) that result are constructed within or are inserted into oneor more vectors, and then transformed into each of the host cellpopulations in the array. Nucleic acid or a polynucleotide said to beprovided in an “expressible form” means nucleic acid or a polynucleotidethat contains at least one gene that can be expressed by the one or moreof the host cell populations of the invention.

Extensive sequence information required for molecular genetics andgenetic engineering techniques is widely publicly available. Access tocomplete nucleotide sequences of mammalian, as well as human, genes,cDNA sequences, amino acid sequences and genomes can be obtained fromGenBank at the website www.ncbi.nlm.nih.gov/Entrez. Additionalinformation can also be obtained from GeneCards, an electronicencyclopedia integrating information about genes and their products andbiomedical applications from the Weizmann Institute of Science Genomeand Bioinformatics (bioinformatics.weizmann.ac.il/cards), nucleotidesequence information can be also obtained from the EMBL NucleotideSequence Database (www.ebi.ac.uk/embl/) or the DNA Databank or Japan(DDBJ, www.ddbi.nig.ac.ii/; additional sites for information on aminoacid sequences include Georgetown's protein information resource website(www-nbrf.Reorgetown.edu/pirl) and Swiss-Prot(au.expasy.org/sprot/sprot-top.html).

Examples of proteins that can be expressed in this invention includemolecules such as, e.g., renin, a growth hormone, including human growthhormone; bovine growth hormone; growth hormone releasing factor;parathyroid hormone; thyroid stimulating hormone; lipoproteins;α-1-antitrypsin; insulin A-chain; insulin B-chain; proinsulin;thrombopoietin; follicle stimulating hormone; calcitonin; luteinizinghormone; glucagon; clotting factors such as factor VIIIC, factor IX,tissue factor, and von Willebrands factor; anti-clotting factors such asProtein C; atrial naturietic factor; lung surfactant; a plasminogenactivator, such as urokinase or human urine or tissue-type plasminogenactivator (t-PA); bombesin; thrombin; hemopoietic growth factor; tumornecrosis factor-alpha and -beta; enkephalinase; a serum albumin such ashuman serum albumin; mullerian-inhibiting substance; relaxin A-chain;relaxin B-chain; prorelaxin; mouse gonadotropin-associated polypeptide;a microbial protein, such as beta-lactamase; Dnase; inhibin; activin;vascular endothelial growth factor (VEGF); receptors for hormones orgrowth factors; integrin; protein A or D; rheumatoid factors; aneurotrophic factor such as brain-derived neurotrophic factor (BDNF),neurotrophin-3, -4, -5, or -6 (NT-3, NT-4, NT-5, or NT-6), or a nervegrowth factor such as NGF-13; cardiotrophins (cardiac hypertrophyfactor) such as cardiotrophin-1 (CT-1); platelet-derived growth factor(PDGF); fibroblast growth factor such as aFGF and bFGF; epidermal growthfactor (EGF); transforming growth factor (TGF) such as TGF-alpha andTGF-β, including TGF-β1, TGF-β2, TGF-β3, TGF-β4, or TGF-β5; insulin-likegrowth factor-I and -II (IGF-I and IGF-II); des(1-3)—IGF-I (brainIGF-I), insulin-like growth factor binding proteins; CD proteins such asCD-3, CD-4, CD-8, and CD-19; erythropoietin; osteoinductive factors;immunotoxins; a bone morphogenetic protein (BMP); an interferon such asinterferon-alpha, -beta, and -gamma; colony stimulating factors (CSFs),e.g., M-CSF, GM-CSF, and G-CSF; interleukins (ILs), e.g., IL-1 to IL-10;anti-HER-2 antibody; superoxide dismutase; T-cell receptors; surfacemembrane proteins; decay accelerating factor; viral antigen such as, forexample, a portion of the AIDS envelope; transport proteins; homingreceptors; addressins; regulatory proteins; antibodies; and fragments ofany of the above-listed polypeptides.

In certain embodiments, the protein or polypeptide can be selected fromIL-1, IL-1a, IL-1b, IL-2, IL-3, IL-4, IL-5, IL-6, IL-7, IL-8, IL-9,IL-10, IL-11, IL-12, IL-12elasti, IL-13, IL-15, IL-16, IL-18, IL-18BPa,IL-23, IL-24, VIP, erythropoietin, GM-CSF, G-CSF, M-CSF, plateletderived growth factor (PDGF), MSF, FLT-3 ligand, EGF, fibroblast growthfactor (FGF; e.g., α-FGF (FGF-1), β-FGF (FGF-2), FGF-3, FGF-4, FGF-5,FGF-6, or FGF-7), insulin-like growth factors (e.g., IGF-1, IGF-2);tumor necrosis factors (e.g., TNF, Lymphotoxin), nerve growth factors(e.g., NGF), vascular endothelial growth factor (VEGF); interferons(e.g., IFN-α, IFN-β, IFN-γ); leukemia inhibitory factor (LIF); ciliaryneurotrophic factor (CNTF); oncostatin M; stem cell factor (SCF);transforming growth factors (e.g., TGF-α, TGF-β1, TGF-β2, TGF-β3); TNFsuperfamily (e.g., LIGHT/TNFSF14, STALL-1/TNFSF13B (BLy5, BAFF, THANK),TNFalpha/TNFSF2 and TWEAK/TNFSF12); or chemokines (BCA-1/BLC-1,BRAK/Kec, CXCL16, CXCR3, ENA-78/LIX, Eotaxin-1, Eotaxin-2/MPIF-2,Exodus-2/SLC, Fractalkine/Neurotactin, GROalpha/MGSA, HCC-1, I-TAC,Lymphotactin/ATAC/SCM, MCP-1/MCAF, MCP-3, MCP-4, MDC/STCP-1/ABCD-1,MIP-1.quadrature., MIP-1.quadrature., MIP-2.quadrature./GRO.quadrature.,MIP-3.quadrature./Exodus/LARC, MIP-3/Exodus-3/ELC, MIP-4/PARC/DC-CK1,PF-4, RANTES, SDF1, TARC, TECK, microbial toxins, ADP ribosylatingtoxins, microbial or viral antigens).

In one embodiment of the present invention, the protein of interest canbe a multi-subunit protein or polypeptide. Multisubunit proteins thatcan be expressed include homomeric and heteromeric proteins. Themultisubunit proteins may include two or more subunits that may be thesame or different. For example, the protein may be a homomeric proteincomprising 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 or more subunits. Theprotein also may be a heteromeric protein including 2, 3, 4, 5, 6, 7, 8,9, 10, 11, 12, or more subunits. Exemplary multisubunit proteinsinclude: receptors including ion channel receptors; extracellular matrixproteins including chondroitin; collagen; immunomodulators including MHCproteins, full chain antibodies, and antibody fragments; enzymesincluding RNA polymerases, and DNA polymerases; and membrane proteins.

In another embodiment, the protein of interest can be a blood protein.The blood proteins expressed in this embodiment include but are notlimited to carrier proteins, such as albumin, including human and bovinealbumin, transferrin, recombinant transferrin half-molecules,haptoglobin, fibrinogen and other coagulation factors, complementcomponents, immunoglobulins, enzyme inhibitors, precursors of substancessuch as angiotensin and bradykinin, insulin, endothelin, and globulin,including alpha, beta, and gamma-globulin, and other types of proteins,polypeptides, and fragments thereof found primarily in the blood ofmammals. The amino acid sequences for numerous blood proteins have beenreported (see, S. S. Baldwin (1993) Comp. Biochem Physiol.106b:203-218), including the amino acid sequence for human serum albumin(Lawn, L. M., et al. (1981) Nucleic Acids Research, 9:6103-6114.) andhuman serum transferrin (Yang, F. et al. (1984) Proc. Natl. Acad. Sci.USA 81:2752-2756).

In another embodiment, the protein of interest can be an enzyme orco-factor. The enzymes and co-factors expressed in this embodimentinclude but are not limited to aldolases, amine oxidases, amino acidoxidases, aspartases, B12 dependent enzymes, carboxypeptidases,carboxyesterases, carboxylyases, chemotrypsin, CoA requiring enzymes,cyanohydrin synthetases, cystathione synthases, decarboxylases,dehydrogenases, alcohol dehydrogenases, dehydratases, diaphorases,dioxygenases, enoate reductases, epoxide hydrases, fumerases, galactoseoxidases, glucose isomerases, glucose oxidases, glycosyltrasferases,methyltransferases, nitrile hydrases, nucleoside phosphorylases,oxidoreductases, oxynitilases, peptidases, glycosyltrasferases,peroxidases, enzymes fused to a therapeutically active polypeptide,tissue plasminogen activator; urokinase, reptilase, streptokinase;catalase, superoxide dismutase; Dnase, amino acid hydrolases (e.g.,asparaginase, amidohydrolases); carboxypeptidases; proteases, trypsin,pepsin, chymotrypsin, papain, bromelain, collagenase; neuramimidase;lactase, maltase, sucrase, and arabinofuranosidases.

In another embodiment, the protein of interest can be a single chain,Fab fragment and/or full chain antibody or fragments or portionsthereof. A single-chain antibody can include the antigen-binding regionsof antibodies on a single stably-folded polypeptide chain. Fab fragmentscan be a piece of a particular antibody. The Fab fragment can containthe antigen binding site. The Fab fragment can contain 2 chains: a lightchain and a heavy chain fragment. These fragments can be linked via alinker or a disulfide bond.

In other embodiments, the protein of interest is a protein that isactive at a temperature from about 20 to about 42° C. In one embodiment,the protein is active at physiological temperatures and is inactivatedwhen heated to high or extreme temperatures, such as temperatures over65° C.

In one embodiment, the protein of interest is a protein that is activeat a temperature from about 20 to about 42° C., and/or is inactivatedwhen heated to high or extreme temperatures, such as temperatures over65° C.; is, or is substantially homologous to, a native protein, such asa native mammalian or human protein and not expressed from nucleic acidsin concatameric form, where the promoter is not a native promoter in tothe host cell used in the array but is derived from another organism,such as E. coli.

The heterologous protein(s) expressed using the compositions and methodsof the invention can be any protein wished to be overexpressed, e.g., aprotein that has been found to be difficult to express. Such a proteinmay have been found to form inclusion bodies, aggregate, be degraded, orotherwise be produced in an unsatisfactory manner in previous attemptsat overexpression. The protein may have been predicted to be insolublebased on analysis of the amino acid sequence. It is known to those ofskill in the art that the propensity for a protein to be insoluble canbe evaluated using prediction tools available to those of skill in theart. Prediction tools include, e.g., PROSO, described by Smialowski, etal., 2007, “Protein solubility: sequence based prediction andexperimental verification,” Bioinformatics 23(19):2536. PROSO can beused to assess the chance that a protein will be soluble uponheterologous expression in E. coli. The sequence-based approachclassifies proteins as “soluble” or “insoluble.” A web server forprotein solubility prediction is available athttp://webclu.bio.wzw.tum.de:8080/proso. Another tool is SOLpro,described by Magnan, et al., 2009, “SOLpro: accurate sequence-basedprediction of protein solubility,” Bioinformatics 25(17):2200-2207.SOLpro predicts the propensity of a protein to be soluble uponoverexpression in E. coli. It is integrated in the SCRATCH suite ofpredictors and is available for download as a standalone application andas a web server at: http://scratchproteomics.ics.uci.edu.

Table 9 lists exemplary heterologous proteins that can be expressedusing the methods and arrays of the present invention, and includesexamples of references and sequence information relating to proteinslisted. The lists of exemplary proteins and exemplary sequencesprovided, in Table 8 and elsewhere herein, are in no way intended to belimiting. It is understood that the compositions and methods of theinvention can be used in the expression of any desired protein.

TABLE 9 Exemplary Heterologous Proteins Exemplary References/SequencesProtein Class Exemplary Protein (incorporated herein by reference)Vertebrate and ω-Agatoxin Swiss-Prot Acc. No. P15969 (omega InvertebrateAnimal μ-Agatoxin agatoxin 1A) Toxins Swiss-Prot: P15970 (omega agatoxin1B) Agitoxin Allopumiliotoxin 267A ω-Atracotoxin-HV1 δ-Atracotoxin-Hv1bBatrachotoxin (Dendrobatidae frogs) Botrocetin (Bothrops jararaca)Usami, et al., 1993, “Primary structure of two-chain botrocetin, a vonWillebrand factor modulator purified from the venom of Bothropsjararaca,” Proc. Natl. Acad. Sci. USA 90: 928-932 Bufotoxins(Arenobufagin, Bufotalin, Bufotenin · Cinobufagin, Marinobufagin)Bungarotoxin (Alpha-Bungarotoxin, Beta-Bungarotoxin) CalcicludineCalciseptine Cardiotoxin III Catrocollastatin C (Crotalus atrox)Calvete, et al., 2000, “The disulfide bond pattern of catrocollastatinC, a disintegrin- like/cysteine-rich protein isolated from Crotalusatrox venom,” Protein Science 9: 1365-1373 Charybdotoxin Ciguatera Cobravenom cytotoxins Chiou, et al., 1993, “Cobra venom cardiotoxin(cytotoxin) isoforms and neurotoxin: Comparative potency of proteinkinase C inhibition and cancer cell cytotoxicity and modes of enzymeinhibition,” Biochemistry, 32 (8), pp 2062-2067 Conotoxin Echinoidin(Anthocidaris crassispina) Eledoisin Epibatidine Fibrolase (Agkistrodoncontortrix Randolph, et al., 1992, “Amino acid contortrix) sequence offibrolase, a direct-acting fibrinolytic enzyme from Agkistrodoncontortrix contortrix venom,” Protein Science 1 590-600 HefutoxinHistrionicotoxin Huwentoxin-I Huwentoxin-II (Selenocosmia Shu, et al.,2002, “The structure of spider huwena) toxin huwentoxin-II with uniquedisulfide linkage: Evidence for structural evolution,” Protein Science11: 245-252 J-ACTX-Hv1c Kunitz-Type Toxins, e.g. Yuan, et al., 2008,“Discovery of a distinct Dendrotoxin-K, Dendrotoxin 1 superfamily ofKunitz-type toxin (KTT) from tarantulas,” PLoS one 3(10): e3414, doi:10.1371/journal.pone.0003414 Latrotoxin (Alpha-latrotoxin) MargatoxinMaurotoxin Onchidal PhTx3 Pumiliotoxin 251D Rattlesnake lectinRobustoxin Saxitoxin Scyllatoxin Slotoxin Stromatoxin TaicatoxinTarichatoxin Tetrodotoxin (e.g., toads, Tetraodontiformes fish,Naticidae sea snails, newts, Vibrio bacteria) Plant toxins Ricin(Ricinus communis) GenBank Nucleotide Acc. No. DQ661048 (Ricin A chain)Halling, et al., 1985, “Genomic cloning and characterization of a ricingene from Ricinus communis” Nucleic Acids Res. 13(22): 8019-33 (Sequenceon p. 8025) Gelonin (Gelonium multiflorum) GenBank Acc. No. L12243Fungal toxins Aflatoxin Amatoxin (Alpha-amanitin, Beta- amanitin,Gamma-amanitin, Epsilon-amanitin) Citrinin Cytochalasin ErgotamineFumagillin Fumonisin (Fumonisin B1, Fumonisin B2) Gliotoxin GenBank Acc.No. AAW03299 (gliotoxin) Gardiner, et al., 2005, “Bioinformatic andexpression analysis of the putative gliotoxin biosynthetic gene clusterof Aspergillus fumigatus, FEMS Microbiol. Lett. 248(2): 241-248Tsunawaki, et al., 2004, “Fungal metabolite gliotoxin inhibits assemblyof the human respiratory burst NADPH oxidase,” Infection and Immunity72(6): 3373-3382 Helvolic Acid Ibotenic acid Muscimol Ochratoxin PatulinSterigmatocystin Trichothecene Vomitoxin Zeranol Zearalenone Bacterialtoxins Bacillus anthracis toxins: e.g., Swiss-Prot Acc. No. P13423.2(rPA, Anthrax toxin, Adenylate cyclase, Protective Antigen) rPA Bacillusthuringiensis: Cry toxins GenBank accession numbers for Cry proteinslisted in, e.g., Table 1 of U.S. Pat. No. 6,642,030, “Nucleic acidcompositions encoding modified Bacillus thuringiensis coleopteran-toxiccrystal proteins” Bordetella pertussis: Pertussis toxin EMBL M13223(pertussis toxin operon of 5 Pertussis toxin variants ORFs) U.S. Pat.No. 5,085,862, “Genetic detoxification of pertussis toxin” U.S. Pat. No.5,165,927, “Composition with modified pertussis toxin” U.S. Pat. No.5,773,600, “DNA encoding pertussis toxin muteins” Clostridium botulinum:Botulinum Fischer, et al., 2007, “Crucial role of the toxins disulfidebridge between botulinum neurotoxin light and heavy chains in proteasetranslocation across membranes,” J. Biol. Chem. 282(40): 29604-11, Epub,Baldwin, et al., 2008, “Subunit vaccine against the seven serotypes ofbotulism,” Infection and Immunity 76(3): 1314-1318 Clostridiumdifficile: Toxin A, B Swiss-Prot Acc. No. P16154 (wild type Wild type,variants, mutants Toxin A, strain VPI) Swiss-Prot Acc. No. P18177 (wildtype Toxin B, strain VPI) US Pat. App. Pub. Nos. 2004/0028705 and2008/0107673, “Mutants of clostridium difficile toxin B and methods ofuse” Clostridium perfringens: Alpha toxin, Enterotoxin Clostridiumtetani: Tetanus toxin GenBank Acc. No. 1A8D_A U.S. Pat. No. 5,571,694,“Expression of tetanus toxin fragment C in yeast” U.S. Pat. No.6,372,225, “Tetanus toxin functional fragment antigen and tetanusvaccine” Schiavo, et al., 1990, “An intact interchain disulfide bond isrequired for the neurotoxicity of tetanus toxin,” Infection and Immunity58(12): 4136-4141 U.S. Pat. No. 7,556,817, “Clostridial toxinactivatable Clostridial toxins” Corynebacterium beta: Diphtheria GenBankAcc. No. K01722 (DT nucleotide) toxin (DT) GenBank Acc. No. AAA32182 (DTprotein) Greenfield, et al., 1983, “Nucleotide sequence of thestructural gene for diphtheria toxin carried by corynebacteriophagebeta,” Proc. Natl. Acad. Sci. U.S.A. 80(22): 6853-6857 Papini, et al.,1993, “Cell penetration of diphtheria toxin. Reduction of the interchaindisulfide bridge is the rate-limiting step of translocation in thecytosol,” J. Biol. Chem. 268(3): 1567-74 Diphtheria toxin variants,e.g., GenBank Acc. No. 1007216A (CRM197) CRM45, CRM176, CRM197 GenBankAcc. No. 1007216B (CRM45) U.S. Pat. No. 7,585,942, “Diphtheria toxinvariant” Orr, et al., 1999, “Expression and Immunogenicity of a MutantDiphtheria Toxin Molecule, CRM 197, and Its Fragments in Salmonellatyphi Vaccine Strain CVD 908-htrA,” Infection and Immunity 67(8):4290-4294 Giannini, et al., 1984, “The amino-acid sequence of twonon-toxic mutants of diphtheria toxin: CRM45 and CRM197,” Nucleic AcidsResearch 12(10): 4063-4069 E. coli: GenBank Acc. No. AAA24685(Heat-labile Verotoxin/Shiga-like toxin enterotoxin A prepeptide)Heat-stable enterotoxin GenBank Acc. No. AAC60441 (Heat-labileHeat-labile enterotoxin enterotoxin B subunit; LTc B subunit)Enterotoxins Listeria monocytogenes: Listeriolysin O Mycobacteriumtuberculosis: Cord factor Pseudomonas exotoxin Salmonella endotoxin,exotoxin Shigella disinteriae: Shiga toxin Staphylococcus aureus:Alpha/beta/delta toxin Exfoliatin Toxin Toxic shock syndrome toxinEnterotoxins Leukocidin (Panton-Valentine leukocidin) Streptococcuspyogenes: Akao, et al., 1999, “Unique synthetic Streptolysin S peptidesstimulating streptolysin S production in streptococci,” J. Biochem.125(1): 27-30 Akao, et al., 1992, “Purification and characterization ofa peptide essential for formation of streptolysin S by Streptococcuspyogenes,” Infection and Immunity 60(11): 4777-4780 Vibrio cholerae:Cholera toxin GenBank Acc. No. ACH70471 Tsai, et al., 2002, “Unfoldedcholera toxin is transferred to the ER membrane and released fromprotein disulfide isomerase upon oxidation by Ero1,” J. Cell Biology159(2): 207-215 Toxin-like proteins “ClanTox: a classifier of shortanimal toxins,” Nucleic Acids Research 37, Web Server issue W363-W368doi: 10.1093/nar/gkp299. Cytokines Interferon alpha 2a Swiss-Prot P01563(Receptors and (mature form amino acids 24-188) Ligands) Interferonalpha 2b GenBank Acc. No. NP_000596 (mature form amino acids 24-188)U.S. Pat. No. 7,189,389, “Pharmaceutical composition of humaninterferon-alpha 2 and interferon- alpha 8 subtypes” Interferon betaGenBank Acc. No. ABS89222 U.S. Pat. No. 7,399,463, “HSA-freeformulations of interferon-beta” Interferon gamma GenBank Acc. No.NP_000610 (mature form aa 24-166) U.S. Pat. No. 7,524,931, “Full-lengthinterferon gamma polypeptide variants” U.S. Pat. No. 7,504,237,“Polynucleotides encoding interferon gamma polypeptides” Interleukin 1beta GenBank Acc. No. NP_000567 (mature form aa 117-269) Interleukin 6GenBank Acc. No. AAC41704 U.S. Pat. No. 7,560,112, “Anti-il-6antibodies, compositions, methods and uses” Tumor Necrosis FactorFamily, e.g., GenBank Acc. No. CAA26669.1 (human TNFα TNF-alpha) TNFβ(formerly LTα) (mature form aa 77-233) LTβ PCT WO 2005/103077 TRELLAmino acid sequences for human TNF, LT- FasL α, LT-β, FasL, TFRP, TRAIL,CD27L, CD40L CD30L, CD40L, and 4-1BBL and TRELL CD30L provided by, e.g.,U.S. Pat. No. 7,566,769,” CD27L Tumor necrosis factor related ligand”4-1BBL GenBank Acc. No. AAA61198 (human TNF-related apoptosis-inducingtumor necrosis factor) ligand (TRAIL) Wang, et al., 1985, “Molecularcloning of RANKL (also TRANCE) the complementary DNA for human tumorGITRL necrosis factor,” Science 228 (4696), 149-154 TNF-2 GenBank Acc.No. AAA61200 (human TFRP tumor necrosis factor) OX40L Nedospasov, etal., 1986, “Tandem arrangement of genes coding for tumor necrosis factor(TNF-alpha) and lymphotoxin (TNF-beta) in the human genome,” Cold SpringHarb. Symp. Quant. Biol. 51 Pt 1, 611-624 U.S. Pat. No. 7,544,519, “Fhma novel member of the TNF ligand supergene family: materials and methodsfor interaction modulators” Mouse and human RANKL sequences provided in,e.g., U.S. Pat. No. 7,411,050, “Monoclonal blocking antibody to humanRANKL” GenBank Acc. No. AB008426 (mouse RANKL) Yasuda, et al., 1998,“Osteoclast differentiation factor is a ligand forosteoprotegerin/osteoclastogenesis- inhibitory factor and is identicalto TRANCE/RANKL,” Proc. Natl. Acad. Sci. U.S.A. 95(7), 3597-3602Anderson, et al., 1997, “A homologue of the TNF receptor and its ligandenhance T-cell growth and dendritic-cell function,” Nature 390 (6656),175-179 Antibodies/Antibody Modified anti-TNF-alpha antibody U.S. Pat.No. 6,015,557, “Tumor necrosis factor Derivatives Infliximab (Remicade)antagonists for the treatment of neurological disorders” Nagahira, etal., Humanization of a mouse neutralizing monoclonal antibody againsttumor necrosis factor-alpha (TNF-alpha), J Immunol Methods. 1999 Jan 1;222(1-2): 83-92 Knight, et al., Construction and initialcharacterization of a mouse-human chimeric anti-TNF antibody. MolImmunol. 1993 Nov; 30(16): 1443-53. Golimumab (Simponi) Adalimumab(Humira) Diabodies EP 0 404 097, “Bispecific and oligospecific, mono-and oligovalent receptors, production and applications thereof” WO93/11161, “Multivalent antigen- binding proteins” Hollinger et al.,Proc. Natl. Acad. Sci. USA, 90: 6444-6448 (1993)); Linear antibodiesU.S. Pat. No. 5,641,870, “Low pH hydrophobic interaction chromatographyfor antibody purification” Zapata et al., 1995, “Engineering linearF(ab′)2 fragments for efficient production in Escherichia coli andenhanced antiproliferative activity,” Protein Eng. 8(10): 1057-1062Nanobodies U.S. Pat. App. Pub. No. 2007/0178082 and Single-domainantibodies (e.g., 2009/0238829, “Stabilized single domain shark IgNAR orVNAR, camelid) antibodies” Heterospecific antibodies U.S. Pat. App. Pub.No. 2006/0149041 and Trivalent antibodies 2006/0149041 “Therapeuticpolypeptides, homologues thereof, fragments thereof and for use inmodulating platelet-mediated aggregation” U.S. Pat. App. Pub. No.2009/0252681 “Nanobodies and Polypeptides Against EGFR and IGF-IR” U.S.Pat. App. Pub. No. 2009/0074770, “Amino acid sequences that bind toserum proteins in a manner that is essentially independent of the pH,compounds comprising the same, and uses thereof” U.S. Pat. App. Pub. No.2009/0028880, “Serum albumin binding proteins” U.S. Pat. App. Pub. No.2009/0022721, 2007/0077249, and 2007/0237769 “Single domain antibodiesdirected against tumour necrosis factor-alpha and uses therefor” U.S.Pat. App. Pub. No. 2008/0267949 “Peptides capable of binding to serumproteins” U.S. Pat. App. Pub. No. 2008/0107601 “Nanobodies AgainstAmyloid-Beta and Polypeptides Comprising the Same for the Treatment ofDegenerative Neural Diseases Such as Alzheimer's Disease” U.S. Pat. App.Pub. No. 2008/0096223 “Methods And Assays For Distinguishing BetweenDifferent Forms Of Diseases And Disorders Characterized ByThrombocytopenia And/Or By Spontaneous Interaction Between VonWillebrand Factor (Vwf) And Platelets” U.S. Pat. App. Pub. No.2007/0269422 “Serum albumin binding proteins with long half-lives” U.S.Pat. App. Pub. No. 2006/0246477 and 2006/0211088 “Method for generatingvariable domain sequences of heavy chain antibodies” U.S. Pat. App. Pub.No. 2006/0115470 “Camelidae antibodies against immunoglobulin e and usethereof for the treatment of allergic disorders” Wesolowski, et al.,2009, “Single domain antibodies: promising experimental and therapeutictools in infection and immunity,” Med Microbiol Immunol. 198(3): 157-174US Pat. App. Pub. No. 2009/0148438, “Binding Moieties Based on SharkIgnar Domains” BiTE molecules U.S. Pat. No. 7,235,641, “Bispecificantibodies” U.S. Pat. No. 7,575,923 and 7,112,324, “CD19xCD3 specificpolypeptides and uses thereof” US Pat. App. Pub. No. 2006/0193852 “NovelCD19xCD3 specific polypeptides and uses thereof” US Pat. App. Pub. No.2007/0123479 “Pharmaceutical compositions comprising bispecificanti-cd3, anti-cd19 antibody constructs for the treatment of b-cellrelated disorders” US Pat. App. Pub. Nos. 2009/0226444 and 2009/0226432,“Pharmaceutical Antibody Compositions with Resistance To Soluble CEA”Domain antibodies (dAbs) U.S. Pat. No. 7,563,443, “Monovalent anti-CD40Lantibody polypeptides and compositions thereof” US Pat. App. Pub. No.2006/0062784 “Compositions monovalent for CD40L binding and methods ofuse” scFV GenBank Acc. No. CAA12399.1 Anti-beta-galactosidase GenBankAcc. No. CAA12398 Humanized/Modified antibodies U.S. Pat. App. Pub. No.2009/0191186, “Antibodies to the PcrV Antigen of Pseudomonas aeruginosa”Bebbington, et al., 2008, “Antibodies for the treatment of bacterialinfections: current experience and fugure prospects,” Current Opin. inBiotech. 19(6): 613-619 Growth Activin A (Inhibin A) Swiss-Prot Acc. No.P08476.2 (Inhibin beta Factors/Hormones A chain/Activin beta-A chain) US575751, “Activin-A mutants” Epidermal growth factor (EGF) Swiss-ProtAcc. No. P01133.2 (mature form aa 971-1023) Erythropoietin Swiss-ProtAcc. No. P01588 (mature form aa 28-193) U.S. Pat. No. 7,553,941,“Long-acting polypeptides and methods of producing same” Fibroblastgrowth factors 1, 2, 21 GenBank Acc. No. NP_061986 (FGF-1, 2, 21) U.S.Pat. No. 7,459,540, “Fibroblast growth factor- like polypeptides” U.S.Pat. No. 7,576,190, “FGF-21 fusion proteins” U.S. Pat. No. 7,491,697,“Muteins of fibroblast growth factor 21” U.S. Pat. No. 7,582,607,“Muteins of fibroblast growth factor 21” GenBank Acc. Nos. AAH18404 andABI75345 Granulocyte Colony Stimulating GenBank Acc. No. ABI85510.1Factor U.S. Pat. No. 7,381,804, “G-CSF analog compositions and methods”Growth Hormone GenBank NP_000506.2 Cytoplasmic (mature form aa 27-217)Secreted U.S. Pat. No. 7,553,941, “Long-acting polypeptides Variants andmethods of producing same” U.S. Pat. No. 7,553,940, “Long-acting EPOpolypeptides and derivatives thereof and methods thereof” Hepatocytegrowth factor (HGF) GenBank Acc. No. BAA14348 Keratinocyte growth factor(KGF) Leukemia Inhibitory Factor GenBank Acc. No. AAA51699 (mature formaa 25-213) U.S. Pat. No. 7,445,772, “Heterodimeric four helix bundlecytokines” Nerve growth factor (NGF) Platelet derived growth factor(PDGF) Thrombopoietin Swiss-Prot Acc. No. P40225 (amino acids 22-353)U.S. Pat. No. 6,673,580, “Identification and modification ofimmunodominant epitopes in polypeptides” Transforming growthfactor-alpha (TGF-alpha) Transforming growth factor-beta (TGF-beta)Vascular endothelial growth factor GenBank Acc. No. CAA44447 (VEGF) U.S.Pat. No. 7,427,596, “Variants of vascular endothelial cell growthfactor, their uses, and processes for their production” U.S. Pat. No.7,566,566, “Materials and methods involving hybrid vascular endothelialgrowth factor DNAs and proteins” Human Therapeutic ApoA1 and ApoA1Milano GenBank Acc. No. CAT02154 Proteins GenBank Acc. No. ACK12192 U.S.Pat. No. 7,439,323, “Cysteine-containing peptides having antioxidantproperties” WO 2008/017906 (mature form aa 25-267) Insulin Swiss-ProtAcc. No. P01308 Proinsulin U.S. Pat. No. 7,547,821, “Methods for theproduction of insulin in plants” Insulin-like Growth Factor Swiss-Prot.Acc. No. P01343 (IA) U.S. Pat. No. 7,439,063, “Neuroprotective synergyof erythropoietin and insulin-like growth factors” Swiss-Prot. Acc. No.P05019 (IB) U.S. Pat. No. 7,217,796, “Neutralizing human anti- IGFRantibody” Kringle Domains of Human GenBank Acc. No. AAA36451 Plasminogen(amino acids 469-562) U.S. Pat. No. 7,175,840, “Compositions for genetherapy of rheumatoid arthritis including a gene encoding ananti-angiogenic protein or parts thereof” US Pat. App. Pub. No.2004/0138127, “Novel antiangiogenic peptides, polypeptides encoding sameand methods for inhibiting angiogenesis” Chaperones Hsp 90 (human)Swiss-Prot Acc. No. P07900 BiP (human) GRP94 (human) GRP170 (human)Calnexin (human) Calreticulin (human) HSP47 (human) ERp29 (human)Protein disulfide isomerase (PDI) (human) Peptidyl prolylcis-trans-isomerase (PPI) (human) ERp57 (human) Fusion Proteins/ Ontak(Eisai) Foss, FM, 2001, “Interleukin-2 fusion toxin: Non-naturalProteins targeted therapy for cutaneous T cell lymphoma,” Ann N Y AcadSci. 941: 166-76. Etanercept (Enbrel) Anthrax rPA fusions U.S. Pat. No.7,537,771, “Expression system” Therapeutic Nucleoside deaminase GenBankAcc. No. NP_000013.2 Enzymes Antimicrobial glycosidase- GenBank Acc. No.AAB53783 lysostaphin (aa 249-493) Bovine aprotinin U.S. Pat. No.5,621,074, “Aprotinin analogs” Butyrylcholine esterase GenBank Acc. No.AAA98113.1 U.S. Pat. No. 6,291,175, “Methods for treating a neurologicaldisease by determining BCHE genotype” Ornithine carbamoyltransferaseStreptokinase C GenBank Acc. No. P00779 Biocatalytic Carboxylic acidreductase U.S. Pat. No. 5,795,759, “Carboxylic acid reductase, Enzymes(Nocardia) and methods of using same” DszA U.S. Pat. No. 6,071,738,“Conversion of organosulfur DszB compounds to oxyorganosulfur compoundsDszC for desulfurization of fossil fuels” DszD U.S. Pat. No. 5,952,208,“Dsz gene expression in (Rhodococcus) pseudomonas hosts” L-aminoacylaseToogood, et al., 2002, “A thermostable L- (Thermococcus litoralis)aminoacylase from Thermococcus litoralis: cloning, overexpression,characterization, and applications in biotransformations,” Extremophiles6(2): 1431-0651 Singleton, et al., 2000, “Cloning, expression, andcharacterization of pyrrolidone carboxyl peptidase from the archaeonThermococcus litoralis” Extremophiles 4 (5), 297-303 Pathogen Chlamydiatrachomatis major outer GenBank Acc. No. ABB51004 Proteins/Antigensmembrane protein (MOMP) (mature form aa 23-393) Cowpea Chlorotic MottleVirus coat GenBank Acc. No. NP_613277 protein US Pat. App. Pub. No.2005/0214321, “Recombinant icosahedral virus like particle production inpseudomonads” Salmonella flagellin and variants GenBank Acc. No.AAA27067 thereof (Salmonella enterica subsp. enterica serovar Typhi)GenBank Acc. No. AAL20871 (Salmonella enterica subsp. enterica serovarTyphimurium str. LT2) US 2007/0224205, “Compositions that includehemagglutinin, methods of making and methods of use thereof” HIV GagGenBank Acc. No. AAB50258.1 (HIV-1 Gag) HIV Vpr Swiss-Prot Acc. No.P12520.2 HIV Nef GenBank Acc. No. AAA44993 (HIV-1 Nef) InfluenzaHemagglutinin GenBank Acc. No. ABW06108.1 (Influenza A HA) P. falciparumcircumsporozoite GenBank Acc. No. CAB38998 protein Reagent Proteins,Alpha-1-anti-trypsin U.S. Pat. No. 5,399,684, “DNA sequences expressingOther Proteins mammalian alpha-1-antitrypsin” U.S. Pat. No. 5,736,379,“DNA sequences expressing mammalian alpha₁ antitrypsin” HorseradishPeroxidase C GenBank Acc. No. CAA00083 LRP6 sub-domains Swiss-Prot Acc.No. O75581 (amino acids 20-1370 and subdomians thereof) U.S. Pat. No.7,416,849, “HBM variants that modulate bone mass and lipid levels”Protein A, Cysteinyl Protein A U.S. Pat. No. 5,151,350, “Cloned genesencoding recombinant protein A” U.S. Pat. No. 5,084,559, “Protein Adomain mutants” Streptavidin GenBank Acc. No. CAA00084

In embodiments of the present invention, expression systems thatsuccessfully overexpress toxin proteins are identified. Toxin proteinscontemplated for expression include, but are not limited to, animaltoxins, plant toxins, fungal toxins, and bacterial toxins. Toxinproteins frequently contain structural elements, for example disulfidebonds, that lead to misfolding and insolubility in overexpressionefforts. Kunitz-type toxins (KTTs), found in the venom of animalsincluding spiders, snakes, cone snails, and sea anemones, usually have apeptide chain of around 60 amino acids and are stabilized by threedisulfide bridges. Botrocetin, a toxin from snake venom that causesplatelet aggregation by inducing binding of von Willebrand factor (vWF)to platelet glycoprotein Ib (GPIb), is present in a two-chain formcontaining both intrachain and interchain disulfide bonds. TheBotrocetin two-chain form was reported to be about thirty times moreactive than the single chain form (Usami, et al., 1993).Catrocollastatin C, a snake venom toxin that impairs plateletaggregation by inhibiting fibrinogen binding to the αIIbβ3 integrin,contains 28 cysteine residues that form 14 disulfide bonds (Calvete, etal., 2000).

Toxin-like proteins have been identified in non-venomous contexts andshown to act as cell activity modulators. Toxin-like proteins includeproteases, protease inhibitors, cell antigens, growth factors, etc. Atoxin classification tool available athttp://www.clantox.cs.huji.ac.il/predicts whether a given protein is atoxin or toxin-like protein. The server also provides other information,including the presence of a signal peptide, the number of cysteineresidues, and associated functional annotations. The tool is describedby Naamati, et al., 2009, “ClanTox: a classifier of short animaltoxins,” Nucleic Acids Research 37, Web Server issue W363-W368doi:10.1093/nar/gkp299.

Embodiments of the present invention contemplate the expression ofantibodies or antibody fragments. Many forms of antibody fragments areknown in the art and encompassed herein. “Antibody fragments” compriseonly a portion of an intact antibody, generally including an antigenbinding site of the intact antibody and thus retaining the ability tobind antigen. Examples of antibody fragments encompassed by the presentdefinition include: (i) the Fab fragment, having VL, CL, VH and CH1domains; (ii) the Fab′ fragment, which is a Fab fragment having one ormore cysteine residues at the C-terminus of the CH1 domain; (iii) the Fdfragment having VH and CH1 domains; (iv) the Fd′ fragment having VH andCH1 domains and one or more cysteine residues at the C-terminus of theCH1 domain; (v) the Fv fragment having the VL and VH domains of a singlearm of an antibody; (vi) the dAb fragment (Ward et al., Nature 341,544-546 (1989)) which consists of a VH domain; (vii) isolated CDRregions; (viii) F(ab′)₂ fragments, a bivalent fragment including twoFab′ fragments linked by a disulfide bridge at the hinge region; (ix)single chain antibody molecules (e.g., single chain Fv; scFv) (Bird etal., Science 242:423-426 (1988); and Huston et al., PNAS (USA)85:5879-5883 (1988)); (x) “diabodies” with two antigen binding sites,comprising a heavy chain variable domain (VH) connected to a light chainvariable domain (VL) in the same polypeptide chain (see, e.g., EP404,097; WO 93/11161; and Hollinger et al., Proc. Natl. Acad. Sci. USA,90:6444-6448 (1993)); (xi) “linear antibodies” comprising a pair oftandem Fd segments (VH-CH₁-VH-CH₁) which, together with complementarylight chain polypeptides, form a pair of antigen binding regions (Zapataet al. Proteifz Eng. 8(10): 1057-1062 (1995); and U.S. Pat. No.5,641,870).

Moreover, embodiments of the present invention may include expression ofantibody fragments that are modified to improve their stability and orto create antibody complexes with multivalency. For many medicalapplications, antibody fragments must be sufficiently stable againstdenaturation or proteolysis conditions, and the antibody fragmentsshould ideally bind the target antigens with high affinity. A variety oftechniques and materials have been developed to provide stabilized andor multivalent antibody fragments. An antibody fragment may be fused toa dimerization domain. In one embodiment, the antibody fragmentsexpressed using the compositions and methods of the present inventionare dimerized by the attachment of a dimerization domain, such asleucine zippers.

Fusion proteins and other non-natural proteins are also contemplated forexpression using the methods and compositions of the invention. Anon-natural protein can be, e.g., an engineered protein or a proteinobtained by molecular modeling. An example of a fusion protein is Ontak(Eisai Corporation), also called denileukin diftitox or interleukin-2(IL-2) fusion protein. Ontak was made by replacing the receptor-bindingdomain of diphtheria toxin with IL-2, the receptor for which isoverexpressed in leukemia cells. IL-2 acts to carry the proteininhibitory function of diphtheria toxin to the targeted leukemia cells.Another fusion, Etanercept (Enbrel), links the human gene for solubleTNF receptor 2 to the gene for the Fc component of human immunoglobulinG1.

It is understood that the compositions and methods of the invention canbe used to express variants and mutants of the proteins listed herein,regardless of whether specifically noted. Furthermore, as previouslydescribed, sequence information required for molecular genetics andgenetic engineering techniques relating to many known proteins is widelyavailable, e.g., from GenBank or other sources known to those of skillin the art. The GenBank data herein are provided by way of example. Itis understood that if a GenBank accession number is not expresslyprovided herein, one of skill in the art can identify a desired gene orprotein sequence by searching the GenBank database or the publishedliterature.

It is generally recognized that a search of the GenBank database for aparticular protein or gene can yield multiple hits. This can be due,e.g., to multiple listings of the same sequence, the occurrence ofanalogous genes or proteins in different species, or to the listing oftruncated, partial, or variant sequences. One knowledgeable in the artwill be aware that information relating to the sequence entry isprovided in the accompanying information within the record, for example,in a published report cited in the record. Therefore, one of skill inthe art, when searching for a sequence to use in the methods andcompositions of the invention, will be able to identify the desiredsequence from among a list of multiple results.

It is common knowledge in the art that proteins can be functionallyequivalent despite differences in amino acid sequence. Substitution ofan amino acid by a different amino acid having similar chemicalproperties and size (e.g., a conservative substitution) often does notsignificantly change protein function. Even nonconservative amino acidsubstitutions can be made with no effect on function, for example, whenthe change is made in a part of the protein that is not critical forfunction.

A “conservative amino acid substitution” is one in which the amino acidresidue is replaced with an amino acid residue having a similar sidechain, or similar physicochemical characteristics (e.g., electrostatic,hydrogen bonding, isosteric, hydrophobic features). The amino acids maybe naturally occurring or non-natural (unnatural). Families of aminoacid residues having similar side chains are known in the art. Thesefamilies include amino acids with basic side chains (e.g. lysine,arginine, histidine), acidic side chains (e.g., aspartic acid, glutamicacid), uncharged polar side chains (e.g., glycine, asparagine,glutamine, serine, threonine, tyrosine, methionine, cysteine), nonpolarside chains (e.g., alanine, valine, leucine, isoleucine, proline,phenylalanine, tryptophan), beta-branched side chains (e.g., threonine,valine, isoleucine) and aromatic side chains (e.g., tyrosine,phenylalanine, tryptophan, histidine). Substitutions may also includenon-conservative changes. Substitutions may also include changes thatresult in an increased resistance to proteolysis, for example, changesthat eliminate a protease recognition site in the recombinant protein.It is also known to one of skill in the art that proteins having thesame amino acid sequence can be encoded by different nucleotidesequences due to the redundancy in the genetic code. The presentinvention thus includes the use of protein sequences that are differentfrom the sequences provided or referenced herein, or available frompublic sources, but that are functionally equivalent nonetheless. Alsoincluded are proteins that have the same amino acid sequences but areencoded by different nucleotide sequences.

Codon usage or codon preference is well known in the art. The selectedcoding sequence may be modified by altering the genetic code thereof tomatch that employed by the bacterial host cell, and the codon sequencethereof may be enhanced to better approximate that employed by the host.Genetic code selection and codon frequency enhancement may be performedaccording to any of the various methods known to one of ordinary skillin the art, e.g., oligonucleotide-directed mutagenesis. Useful on-lineInterNet resources to assist in this process include, e.g.: (1) theCodon Usage Database of the Kazusa DNA Research Institute (2-6-7Kazusa-kamatari, Kisarazu, Chiba 292-0818 Japan) and available athttp://www.kazusa.or.jp/codon/; and (2) the Genetic Codes tablesavailable from the NCBI Taxonomy database athttp://www.ncbi.nln.nih.gov/Taxonomy/Utils/wprintgc.cgi?mode=c. Forexample, Pseudomonas species are reported as utilizing Genetic CodeTranslation Table 11 of the NCBI Taxonomy site, and at the Kazusa siteas exhibiting the codon usage frequency of the table shown athttp:/www.kazusa.or.ip/codon/cgibin/.

Equivalence in protein function can be evaluated by any of a number ofassays suitable for the particular protein, as known in the art anddescribed elsewhere herein. For example, the function of an antibody canbe evaluated by measuring its binding to its target antigen, and enzymescan be evaluated by activity assay.

Host Cell

In one embodiment the invention provides an array of P. fluorescens hostcells from which to optimally produce a heterologous protein or peptideof interest. P. fluorescens has been demonstrated to be an improvedplatform for production of a variety of proteins and several efficientsecretion signals have been identified from this organism (see, e.g.,U.S. Pat. App. Pub. No. 2006/0008877 and 2008/0193974).

The Pseudomonads system offers advantages for commercial expression ofpolypeptides and enzymes, in comparison with other bacterial expressionsystems. In particular, P. fluorescens has been identified as anadvantageous expression system. P. fluorescens encompasses a group ofcommon, nonpathogenic saprophytes that colonize soil, water and plantsurface environments. Commercial enzymes derived from P. fluorescenshave been used to reduce environmental contamination, as detergentadditives, and for stereoselective hydrolysis. P. fluorescens is alsoused agriculturally to control pathogens. U.S. Pat. No. 4,695,462describes the expression of recombinant bacterial proteins in P.fluorescens.

It is contemplated that alternate host cells, particularly E. coli,which utilizes expression elements described herein in a manner similarto P. fluorescens, or a multiplicity of different host cells, can beused to generate an array comprising a plurality of phenotypicallydistinct host cells that have been genetically modified to modulate theexpression of one or more target genes, as discussed supra. The hostcell can be any organism in which target genes can be altered. Methodsof identifying target genes homologous to those listed in Tables 1 and 2are known in the art. Further, one of skill in the art would understandhow to identify target genes that are native to or useful in a host cellof interest. Many of these proteins are well known in the art. See, forexample, U.S. Patent Application Publication No. 2006/0110747).

Host cells can be selected from “Gram-negative Proteobacteria Subgroup18.” “Gram-negative Proteobacteria Subgroup 18” is defined as the groupof all subspecies, varieties, strains, and other sub-special units ofthe species Pseudomonas fluorescens, including those belonging, e.g., tothe following (with the ATCC or other deposit numbers of exemplarystrain(s) shown in parenthesis): Pseudomonas fluorescens biotype A, alsocalled biovar 1 or biovar I (ATCC 13525); Pseudomonas fluorescensbiotype B, also called biovar 2 or biovar II (ATCC 17816); Pseudomonasfluorescens biotype C, also called biovar 3 or biovar III (ATCC 17400);Pseudomonas fluorescens biotype F, also called biovar 4 or biovar IV(ATCC 12983); Pseudomonas fluorescens biotype G, also called biovar 5 orbiovar V (ATCC 17518); Pseudomonas fluorescens biovar VI; Pseudomonasfluorescens Pf0-1; Pseudomonas fluorescens Pf-5 (ATCC BAA-477);Pseudomonas fluorescens SBW25; and Pseudomonas fluorescens subsp.cellulosa (NCIMB 10462).

The host cell can be selected from “Gram-negative ProteobacteriaSubgroup 19.” “Gram-negative Proteobacteria Subgroup 19” is defined asthe group of all strains of Pseudomonas fluorescens biotype A. Aparticularly preferred strain of this biotype is P. fluorescens strainMB101 (see U.S. Pat. No. 5,169,760 to Wilcox), and derivatives thereof.An example of a preferred derivative thereof is P. fluorescens strainMB214, constructed by inserting into the MB101 chromosomal asd(aspartate dehydrogenase gene) locus, a native E. coli PlacI-lacI-lacZYAconstruct (i.e. in which PlacZ was deleted).

Additional P. fluorescens strains that can be used in the presentinvention include Pseudomonas fluorescens Migula and Pseudomonasfluorescens Loitokitok, having the following ATCC designations: [NCIB8286]; NRRL B-1244; NCIB 8865 strain CO1; NCIB 8866 strain CO2; 1291[ATCC 17458; IFO 15837; NCIB 8917; LA; NRRL B-1864; pyrrolidine; PW2[ICMP 3966; NCPPB 967; NRRL B-899]; 13475; NCTC 10038; NRRL B-1603 [6;IFO 15840]; 52-1C; CCEB 488-A [BU 140]; CCEB 553 [EM 15/47]; IAM 1008[AHH-27]; IAM 1055 [AHH-23]; 1 [IFO 15842]; 12 [ATCC 25323; NIH 11; denDooren de Jong 216]; 18 [IFO 15833; WRRL P-7]; 93 [TR-10]; 108 [52-22;IFO 15832]; 143 [IFO 15836; PL]; 149 [2-40-40; IFO 15838]; 182 [IFO3081; PJ 73]; 184 [IFO 15830]; 185 [W2 L-1]; 186 [IFO 15829; PJ 79]; 187[NCPPB 263]; 188 [NCPPB 316]; 189 [PJ227; 1208]; 191 [IFO 15834; PJ 236;22/1]; 194 [Klinge R-60; PJ 253]; 196 [PJ 288]; 197 [PJ 290]; 198 [PJ302]; 201 [PJ 368]; 202 [PJ 372]; 203 [PJ 376]; 204 [IFO 15835; PJ 682];205 [PJ 686]; 206 [PJ 692]; 207 [PJ 693]; 208 [PJ 722]; 212. [PJ 832];215 [PJ 849]; 216 [PJ 885]; 267 [B-9]; 271 [B-1612]; 401 [C71A; IFO15831; PJ 187]; NRRL B-3178 [4; IFO. 15841]; KY 8521; 3081; 30-21; [IFO3081]; N; PYR; PW; D946-B83 [BU 2183; FERM-P 3328]; P-2563 [FERM-P 2894;IFO 13658]; IAM-1126 [43F]; M-1; A506 [A5-06]; A505 [A5-05-1]; A526[A5-26]; B69; 72; NRRL B-4290; PMW6 [NCIB 11615]; SC 12936; A1 [IFO15839]; F 1847 [CDC-EB]; F 1848 [CDC 93]; NCIB 10586; P17; F-12; AmMS257; PRA25; 6133D02; 6519E01; Ni; SC15208; BNL-WVC; NCTC 2583 [NCIB8194]; H13; 1013 [ATCC 11251; CCEB 295]; IFO 3903; 1062; or Pf-5.

In one embodiment, the host cell can be any cell capable of producing aprotein or polypeptide of interest, including a P. fluorescens cell asdescribed above. The most commonly used systems to produce proteins orpolypeptides of interest include certain bacterial cells, particularlyE. coli, because of their relatively inexpensive growth requirements andpotential capacity to produce protein in large batch cultures. Yeastsare also used to express biologically relevant proteins andpolypeptides, particularly for research purposes. Systems includeSaccharomyces cerevisiae or Pichia pastoris. These systems are wellcharacterized, provide generally acceptable levels of total proteinproduction and are comparatively fast and inexpensive. Insect cellexpression systems have also emerged as an alternative for expressingrecombinant proteins in biologically active form. In some cases,correctly folded proteins that are post-translationally modified can beproduced. Mammalian cell expression systems, such as Chinese hamsterovary cells, have also been used for the expression of proteins orpolypeptides of interest. On a small scale, these expression systems areoften effective. Certain biologics can be derived from proteins,particularly in animal or human health applications. In anotherembodiment, the host cell is a plant cell, including, but not limitedto, a tobacco cell, corn, a cell from an Arabidopsis species, potato orrice cell.

In another embodiment, the host cell can be a prokaryotic cell such as abacterial cell including, but not limited to, an Escherichia or aPseudomonas species. Typical bacterial cells are described, for example,in “Biological Diversity: Bacteria and Archaeans”, a chapter of theOn-Line Biology Book, provided by Dr M J Farabee of the EstrellaMountain Community College, Arizona, USA at the websitewww.emc.maricotpa.edu/faculty/farabee/BIOBK/BioBookDiversity. In certainembodiments, the host cell can be a Pseudomonad cell, and can typicallybe a P. fluorescens cell. In other embodiments, the host cell can alsobe an E. coli cell. In another embodiment the host cell can be aeukaryotic cell, for example an insect cell, including but not limitedto a cell from a Spodoptera, Trichoplusia, Drosophila or an Estigmenespecies, or a mammalian cell, including but not limited to a murinecell, a hamster cell, a monkey, a primate or a human cell.

In one embodiment, the host cell can be a member of any of the bacterialtaxa. The cell can, for example, be a member of any species ofeubacteria. The host can be a member of any one of the taxa:Acidobacteria, Actinobacteira, Aquificae, Bacteroidetes, Chlorobi,Chlamydiae, Choroflexi, Chrysiogenetes, Cyanobacteria, Deferribacteres,Deinococcus, Dictyoglomi, Fibrobacteres, Firmicutes, Fusobacteria,Gemmatimonadetes, Lentisphaerae, Nitrospirae, Planctomycetes,Proteobacteria, Spirochaetes, Thermodesulfobacteria, Thermomicrobia,Thermotogae, Thermus (Thermales), or Verrucomicrobia. In an embodimentof a eubacterial host cell, the cell can be a member of any species ofeubacteria, excluding Cyanobacteria.

The bacterial host can also be a member of any species ofProteobacteria. A proteobacterial host cell can be a member of any oneof the taxa Alphaproteobacteria, Betaproteobacteria,Gammaproteobacteria, Deltaproteobacteria, or Epsilonproteobacteria. Inaddition, the host can be a member of any one of the taxaAlphaproteobacteria, Betaproteobacteria, or Gammaproteobacteria, and amember of any species of Gammaproteobacteria.

In one embodiment of a Gamma Proteobacterial host, the host will bemember of any one of the taxa Aeromonadales, Alteromonadales,Enterobacteriales, Pseudomonad ales, or Xanthomonadales; or a member ofany species of the Enterobacteriales or Pseudomonad ales. In oneembodiment, the host cell can be of the order Enterobacteriales, thehost cell will be a member of the family Enterobacteriaceae, or may be amember of any one of the genera Erwinia, Escherichia, or Serratia; or amember of the genus Escherichia. Where the host cell is of the orderPseudomonad ales, the host cell may be a member of the familyPseudomonad aceae, including the genus Pseudomonas. GammaProteobacterial hosts include members of the species Escherichia coliand members of the species Pseudomonas fluorescens.

Other Pseudomonas organisms may also be useful. Pseudomonads and closelyrelated species include Gram-negative Proteobacteria Subgroup 1, whichinclude the group of Proteobacteria belonging to the families and/orgenera described as “Gram-Negative Aerobic Rods and Cocci” by R. E.Buchanan and N. E. Gibbons (eds.), Bergey's Manual of DeterminativeBacteriology, pp. 217-289 (8th ed., 1974) (The Williams & Wilkins Co.,Baltimore, Md., USA) (hereinafter “Bergey (1974)”). Table 10 presentsthese families and genera of organisms.

TABLE 10 Families and Genera Listed in the Part, “Gram-Negative AerobicRods and Cocci” (in Bergey (1974)) Family I. PseudomonaceaeGluconobacter Pseudomonas Xanthomonas Zoogloea Family II.Azotobacteraceae Azomonas Azotobacter Beijerinckia Derxia Family III.Rhizobiaceae Agrobacterium Rhizobium Family IV. MethylomonadaceaeMethylococcus Methylomonas Family V. Halobacteriaceae HalobacteriumHalococcus Other Genera Acetobacter Alcaligenes Bordetella BrucellaFrancisella Thermus

“Gram-negative Proteobacteria Subgroup 1” also includes Proteobacteriathat would be classified in this heading according to the criteria usedin the classification. The heading also includes groups that werepreviously classified in this section but are no longer, such as thegenera Acidovorax, Brevundimonas, Burkholderia, Hydrogenophaga,Oceanimonas, Ralstonia, and Stenotrophomonas, the genus Sphingomonas(and the genus Blastomonas, derived therefrom), which was created byregrouping organisms belonging to (and previously called species of) thegenus Xanthomonas, the genus Acidomonas, which was created by regroupingorganisms belonging to the genus Acetobacter as defined in Bergey(1974). In addition hosts can include cells from the genus Pseudomonas,Pseudomonas enalia (ATCC 14393), Pseudomonas nigrifaciensi (ATCC 19375),and Pseudomonas putrefaciens (ATCC 8071), which have been reclassifiedrespectively as Alteromonas haloplanktis, Alteromonas nigrifaciens, andAlteromonas putrefaciens. Similarly, e.g., Pseudomonas acidovorans (ATCC15668) and Pseudomonas testosteroni (ATCC 11996) have since beenreclassified as Comamonas acidovorans and Comamonas testosteroni,respectively; and Pseudomonas nigrifaciens (ATCC 19375) and Pseudomonaspiscicida (ATCC 15057) have been reclassified respectively asPseudoalteromonas nigrifaciens and Pseudoalteromonas piscicida.“Gram-negative Proteobacteria Subgroup 1” also includes Proteobacteriaclassified as belonging to any of the families: Pseudomonad aceae,Azotobacteraceae (now often called by the synonym, the “Azotobactergroup” of Pseudomonad aceae), Rhizobiaceae, and Methylomonadaceae (nowoften called by the synonym, “Methylococcaceae”). Consequently, inaddition to those genera otherwise described herein, furtherProteobacterial genera falling within “Gram-negative ProteobacteriaSubgroup 1” include: 1) Azotobacter group bacteria of the genusAzorhizophilus; 2) Pseudomonad aceae family bacteria of the generaCellvibrio, Oligella, and Teredinibacter; 3) Rhizobiaceae familybacteria of the genera Chelatobacter, Ensifer, Liberibacter (also called“Candidatus Liberibacter”), and Sinorhizobium; and 4) Methylococcaceaefamily bacteria of the genera Methylobacter, Methylocaldum,Methylomicrobium, Methylosarcina, and Methylosphaera.

In another embodiment, the host cell is selected from “Gram-negativeProteobacteria Subgroup 2.” “Gram-negative Proteobacteria Subgroup 2” isdefined as the group of Proteobacteria of the following genera (with thetotal numbers of catalog-listed, publicly-available, deposited strainsthereof indicated in parenthesis, all deposited at ATCC, except asotherwise indicated): Acidomonas (2); Acetobacter (93); Gluconobacter(37); Brevundimonas (23); Beyerinckia (13); Derxia (2); Brucella (4);Agrobacterium (79); Chelatobacter (2); Ensifer (3); Rhizobium (144);Sinorhizobium (24); Blastomonas (1); Sphingomonas (27); Alcaligenes(88); Bordetella (43); Burkholderia (73); Ralstonia (33); Acidovorax(20); Hydrogenophaga (9); Zoogloea (9); Methylobacter (2); Methylocaldum(1 at NCIMB); Methylococcus (2); Methylomicrobium (2); Methylomonas (9);Methylosarcina (1); Methylosphaera; Azomonas (9); Azorhizophilus (5);Azotobacter (64); Cellvibrio (3); Oligella (5); Pseudomonas (1139);Francisella (4); Xanthomonas (229); Stenotrophomonas (50); andOceanimonas (4).

Exemplary host cell species of “Gram-negative Proteobacteria Subgroup 2”include, but are not limited to the following bacteria (with the ATCC orother deposit numbers of exemplary strain(s) thereof shown inparenthesis): Acidomonas methanolica (ATCC 43581); Acetobacter aceti(ATCC 15973); Gluconobacter oxydans (ATCC 19357); Brevundimonas diminuta(ATCC 11568); Beijerinckia indica (ATCC 9039 and ATCC 19361); Derxiagummosa (ATCC 15994); Brucella melitensis (ATCC 23456), Brucella abortus(ATCC 23448); Agrobacterium tumefaciens (ATCC 23308), Agrobacteriumradiobacter (ATCC 19358), Agrobacterium rhizogenes (ATCC 11325);Chelatobacter heintzii (ATCC 29600); Ensifer adhaerens (ATCC 33212);Rhizobium leguminosarum (ATCC 10004); Sinorhizobium fredii (ATCC 35423);Blastomonas natatoria (ATCC 35951); Sphingomonas paucimobilis (ATCC29837); Alcaligenes faecalis (ATCC 8750); Bordetella pertussis (ATCC9797); Burkholderia cepacia (ATCC 25416); Ralstonia pickettii (ATCC27511); Acidovorax facilis (ATCC 11228); Hydrogenophagaflava (ATCC33667); Zoogloea ramigera (ATCC 19544); Methylobacter luteus (ATCC49878); Methylocaldum gracile (NCIMB 11912); Methylococcus capsulatus(ATCC 19069); Methylomicrobium agile (ATCC 35068); Methylomonasmethanica (ATCC 35067); Methylosarcina fibrata (ATCC 700909);Methylosphaera hansonii (ACAM 549); Azomonas agilis (ATCC 7494);Azorhizophilus paspali (ATCC 23833); Azotobacter chroococcum (ATCC9043); Cellvibrio mixtus (UQM 2601); Oligella urethralis (ATCC 17960);Pseudomonas aeruginosa (ATCC 10145), Pseudomonas fluorescens (ATCC35858); Francisella tularensis (ATCC 6223); Stenotrophomonas maltophilia(ATCC 13637); Xanthomonas campestris (ATCC 33913); and Oceanimonasdoudoroffli (ATCC 27123).

In another embodiment, the host cell is selected from “Gram-negativeProteobacteria Subgroup 3.” “Gram-negative Proteobacteria Subgroup 3” isdefined as the group of Proteobacteria of the following genera:Brevundimonas; Agrobacterium; Rhizobium; Sinorhizobium; Blastomonas;Sphingomonas; Alcaligenes; Burkholderia; Ralstonia; Acidovorax;Hydrogenophaga; Methylobacter; Methylocaldum; Methylococcus;Methylomicrobium; Methylomonas; Methylosarcina; Methylosphaera;Azomonas; Azorhizophilus; Azotobacter Cellvibrio; Oligella; Pseudomonas;Teredinibacter; Francisella; Stenotrophomonas; Xanthomonas; andOceanimonas.

In another embodiment, the host cell is selected from “Gram-negativeProteobacteria Subgroup 4.” “Gram-negative Proteobacteria Subgroup 4” isdefined as the group of Proteobacteria of the following genera:Brevundimonas; Blastomonas; Sphingomonas; Burkholderia; Ralstonia;Acidovorax; Hydrogenophaga; Methylobacter; Methylocaldum; Methylococcus;Methylomicrobium; Methylomonas; Methylosarcina; Methylosphaera;Azomonas; Azorhizophilus; Azotobacter; Cellvibrio; Oligella;Pseudomonas; Teredinibacter; Francisella; Stenotrophomonas; Xanthomonas;and Oceanimonas.

In another embodiment, the host cell is selected from “Gram-negativeProteobacteria Subgroup 5.” “Gram-negative Proteobacteria Subgroup 5” isdefined as the group of Proteobacteria of the following genera:Methylobacter; Methylocaldum; Methylococcus; Methylomicrobium;Methylomonas; Methylosarcina; Methylosphaera; Azomonas; Azorhizophilus;Azotobacter; Cellvibrio; Oligella; Pseudomonas; Teredinibacter;Francisella; Stenotrophomonas; Xanthomonas; and Oceanimonas.

The host cell can be selected from “Gram-negative ProteobacteriaSubgroup 6.” “Gram-negative Proteobacteria Subgroup 6” is defined as thegroup of Proteobacteria of the following genera: Brevundimonas;Blastomonas; Sphingomonas; Burkholderia; Ralstonia; Acidovorax;Hydrogenophaga; Azomonas; Azorhizophilus; Azotobacter; Cellvibrio;Oligella; Pseudomonas; Teredinibacter; Stenotrophomonas; Xanthomonas;and Oceanimonas.

The host cell can be selected from “Gram-negative ProteobacteriaSubgroup 7.” “Gram-negative Proteobacteria Subgroup 7” is defined as thegroup of Proteobacteria of the following genera: Azomonas;Azorhizophilus; Azotobacter; Cellvibrio; Oligella; Pseudomonas;Teredinibacter; Stenotrophomonas; Xanthomonas; and Oceanimonas.

The host cell can be selected from “Gram-negative ProteobacteriaSubgroup 8.” “Gram-negative Proteobacteria Subgroup 8” is defined as thegroup of Proteobacteria of the following genera: Brevundimonas;Blastomonas; Sphingomonas; Burkholderia; Ralstonia; Acidovorax;Hydrogenophaga; Pseudomonas; Stenotrophomonas; Xanthomonas; andOceanimonas.

The host cell can be selected from “Gram-negative ProteobacteriaSubgroup 9.” “Gram-negative Proteobacteria Subgroup 9” is defined as thegroup of Proteobacteria of the following genera: Brevundimonas;Burkholderia; Ralstonia; Acidovorax; Hydrogenophaga; Pseudomonas;Stenotrophomonas; and Oceanimonas.

The host cell can be selected from “Gram-negative ProteobacteriaSubgroup 10.” “Gram-negative Proteobacteria Subgroup 10” is defined asthe group of Proteobacteria of the following genera: Burkholderia;Ralstonia; Pseudomonas; Stenotrophomonas; and Xanthomonas.

The host cell can be selected from “Gram-negative ProteobacteriaSubgroup 11.” “Gram-negative Proteobacteria Subgroup 11” is defined asthe group of Proteobacteria of the genera: Pseudomonas;Stenotrophomonas; and Xanthomonas. The host cell can be selected from“Gram-negative Proteobacteria Subgroup 12.” “Gram-negativeProteobacteria Subgroup 12” is defined as the group of Proteobacteria ofthe following genera: Burkholderia; Ralstonia; Pseudomonas. The hostcell can be selected from “Gram-negative Proteobacteria Subgroup 13.”“Gram-negative Proteobacteria Subgroup 13” is defined as the group ofProteobacteria of the following genera: Burkholderia; Ralstonia;Pseudomonas; and Xanthomonas. The host cell can be selected from“Gram-negative Proteobacteria Subgroup 14.” “Gram-negativeProteobacteria Subgroup 14” is defined as the group of Proteobacteria ofthe following genera: Pseudomonas and Xanthomonas. The host cell can beselected from “Gram-negative Proteobacteria Subgroup 15.” “Gram-negativeProteobacteria Subgroup 15” is defined as the group of Proteobacteria ofthe genus Pseudomonas.

The host cell can be selected from “Gram-negative ProteobacteriaSubgroup 16.” “Gram-negative Proteobacteria Subgroup 16” is defined asthe group of Proteobacteria of the following Pseudomonas species (withthe ATCC or other deposit numbers of exemplary strain(s) shown inparenthesis): Pseudomonas abietaniphila (ATCC 700689); Pseudomonasaeruginosa (ATCC 10145); Pseudomonas alcaligenes (ATCC 14909);Pseudomonas anguilliseptica (ATCC 33660); Pseudomonas citronellolis(ATCC 13674); Pseudomonas flavescens (ATCC 51555); Pseudomonas mendocina(ATCC 25411); Pseudomonas nitroreducens (ATCC 33634); Pseudomonasoleovorans (ATCC 8062); Pseudomonas pseudoalcaligenes (ATCC 17440);Pseudomonas resinovorans (ATCC 14235); Pseudomonas straminea (ATCC33636); Pseudomonas agarici (ATCC 25941); Pseudomonas alcaliphila;Pseudomonas alginovora; Pseudomonas andersonii; Pseudomonas asplenii(ATCC 23835); Pseudomonas azelaica (ATCC 27162); Pseudomonas beyerinckii(ATCC 19372); Pseudomonas borealis; Pseudomonas boreopolis (ATCC 33662);Pseudomonas brassicacearum; Pseudomonas butanovora (ATCC 43655);Pseudomonas cellulosa (ATCC 55703); Pseudomonas aurantiaca (ATCC 33663);Pseudomonas chlororaphis (ATCC 9446, ATCC 13985, ATCC 17418, ATCC17461); Pseudomonas fragi (ATCC 4973); Pseudomonas lundensis (ATCC49968); Pseudomonas taetrolens (ATCC 4683); Pseudomonas cissicola (ATCC33616); Pseudomonas coronafaciens; Pseudomonas diterpeniphila;Pseudomonas elongata (ATCC 10144); Pseudomonasflectens (ATCC 12775);Pseudomonas azotoformans; Pseudomonas brenneri; Pseudomonas cedrella;Pseudomonas corrugata (ATCC 29736); Pseudomonas extremorientalis;Pseudomonas fluorescens (ATCC 35858); Pseudomonas gessardii; Pseudomonaslibanensis; Pseudomonas mandelii CC 700871); Pseudomonas marginalis CC10844); Pseudomonas migulae; Pseudomonas mucidolens (ATCC 4685);Pseudomonas orientalis; Pseudomonas rhodesiae; Pseudomonas synxantha(ATCC 9890); Pseudomonas tolaasii (ATCC 33618); Pseudomonas veronii(ATCC 700474); Pseudomonas frederiksbergensis; Pseudomonas geniculata(ATCC 19374); Pseudomonas gingeri; Pseudomonas graminis; Pseudomonasgrimontii; Pseudomonas halodenitrificans; Pseudomonas halophila;Pseudomonas hibiscicola (ATCC 19867); Pseudomonas huttiensis (ATCC14670); Pseudomonas hydrogenovora; Pseudomonas jessenii (ATCC 700870);Pseudomonas kilonensis; Pseudomonas lanceolata (ATCC 14669); Pseudomonaslini; Pseudomonas marginate (ATCC 25417); Pseudomonas mephitica (ATCC33665); Pseudomonas denitrificans (ATCC 19244); Pseudomonaspertucinogena (ATCC 190); Pseudomonas pictorum (ATCC 23328); Pseudomonaspsychrophila; Pseudomonas filva (ATCC 31418); Pseudomonas monteilii(ATCC 700476); Pseudomonas mosselii; Pseudomonas oryzihabitans (ATCC43272); Pseudomonas plecoglossicida (ATCC 700383); Pseudomonas putida(ATCC 12633); Pseudomonas reactans; Pseudomonas spinosa (ATCC 14606);Pseudomonas balearica; Pseudomonas luteola (ATCC 43273);. Pseudomonasstutzeri (ATCC 17588); Pseudomonas amygdali (ATCC 33614); Pseudomonasavellanae (ATCC 700331); Pseudomonas caricapapayae (ATCC 33615);Pseudomonas cichorii (ATCC 10857); Pseudomonas ficuserectae (ATCC35104); Pseudomonas fuscovaginae; Pseudomonas meliae (ATCC 33050);Pseudomonas syringae (ATCC 19310); Pseudomonas viridiflava (ATCC 13223);Pseudomonas thermocarboxydovorans (ATCC 35961); Pseudomonasthermotolerans; Pseudomonas thivervalensis; Pseudomonas vancouverensis(ATCC 700688); Pseudomonas wisconsinensis; and Pseudomonas xiamenensis.

The host cell can be selected from “Gram-negative ProteobacteriaSubgroup 17.” “Gram-negative Proteobacteria Subgroup 17” is defined asthe group of Proteobacteria known in the art as the “fluorescentPseudomonads” including those belonging, e.g., to the followingPseudomonas species: Pseudomonas azotoformans; Pseudomonas brenneri;Pseudomonas cedrella; Pseudomonas corrugata; Pseudomonasextremorientalis; Pseudomonas fluorescens; Pseudomonas gessardii;Pseudomonas libanensis; Pseudomonas mandelii; Pseudomonas marginalis;Pseudomonas migulae; Pseudomonas mucidolens; Pseudomonas orientalis;Pseudomonas rhodesiae; Pseudomonas synxantha; Pseudomonas tolaasii; andPseudomonas veronii.

Other suitable hosts include those classified in other parts of thereference, such as Gram (+) Proteobacteria. In one embodiment, the hostcell is an E. coli. The genome sequence for E. coli has been establishedfor E. coli MG1655 (Blattner, et al. (1997) The complete genome sequenceof Escherichia coli K-12, Science 277(5331): 1453-74) and DNAmicroarrays are available commercially for E. coli K12 (MWG Inc, HighPoint, N.C.). E. coli can be cultured in either a rich medium such asLuria-Bertani (LB) (10 g/L tryptone, 5 g/L NaCl, 5 g/L yeast extract) ora defined minimal medium such as M9 (6 g/L Na₂HPO₄, 3 g/L KH₂PO₄, 1 g/LNH₄C1, 0.5 g/L NaCl, pH 7.4) with an appropriate carbon source such as1% glucose. Routinely, an over night culture of E. coli cells is dilutedand inoculated into fresh rich or minimal medium in either a shake flaskor a fermentor and grown at 37° C.

A host cell can also be of mammalian origin, such as a cell derived froma mammal including any human or non-human mammal. Mammals can include,but are not limited to primates, monkeys, porcine, ovine, bovine,rodents, ungulates, pigs, swine, sheep, lambs, goats, cattle, deer,mules, horses, monkeys, apes, dogs, cats, rats, and mice.

A host cell may also be of plant origin. Cells from any plant can beselected in which to screen for the production of a heterologous proteinof interest. Examples of suitable plant include, but are not limited to,alfalfa, apple, apricot, Arabidopsis, artichoke, arugula, asparagus,avocado, banana, barley, beans, beet, blackberry, blueberry, broccoli,brussels sprouts, cabbage, canola, cantaloupe, carrot, cassaya,castorbean, cauliflower, celery, cherry, chicory, cilantro, citrus,clementines, clover, coconut, coffee, corn, cotton, cranberry, cucumber,Douglas fir, eggplant, endive, escarole, eucalyptus, fennel, figs,garlic, gourd, grape, grapefruit, honey dew, jicama, kiwifruit, lettuce,leeks, lemon, lime, Loblolly pine, linseed, mango, melon, mushroom,nectarine, nut, oat, oil palm, oil seed rape, okra, olive, onion,orange, an ornamental plant, palm, papaya, parsley, parsnip, pea, peach,peanut, pear, pepper, persimmon, pine, pineapple, plantain, plum,pomegranate, poplar, potato, pumpkin, quince, radiata pine, radiscchio,radish, rapeseed, raspberry, rice, rye, sorghum, Southern pine, soybean,spinach, squash, strawberry, sugarbeet, sugarcane, sunflower, sweetpotato, sweetgum, tangerine, tea, tobacco, tomato, triticale, turf,turnip, a vine, watermelon, wheat, yams, and zucchini. In someembodiments, plants useful in the method are Arabidopsis, corn, wheat,soybean, and cotton.

Kits

The present invention also provides kits useful for identifying a hoststrain, e.g. a P. fluorescens host strain, optimal for producing aheterologous protein or polypeptide of interest. The kit comprises aplurality of phenotypically distinct host cells, wherein each populationhas been genetically modified to increase the expression of one or moretarget genes involved in protein production, to decrease the expressionof one or more target genes involved in protein degradation, or both.The array may further comprise one or more populations of cells thathave not been genetically modified to modulate the expression of eithera gene involved in protein production or a gene involved in proteindegradation. These kits may also comprise reagents sufficient tofacilitate growth and maintenance of the cell populations as well asreagents and/or constructs for expression of a heterologous protein orpolypeptide of interest. The populations of host cells may be providedin the kit in any manner suitable for storage, transport, andreconstitution of cell populations. The cell populations may be providedlive in a tube, on a plate, or on a slant, or may be preserved eitherfreeze-dried or frozen in a tube or vial. The cell populations maycontain additional components in the storage media such as glycerol,sucrose, albumin, or other suitable protective or storage agents.

The following examples are offered by way of illustration and not by wayof limitation.

EXPERIMENTAL EXAMPLES Overview

Heterologous protein production often leads to the formation ofinsoluble or improperly folded proteins, which are difficult to recoverand may be inactive. Furthermore, the presence of specific host cellproteases may degrade the protein of interest and thus reduce the finalyield. There is no single factor that will improve the production of allheterologous proteins. Thus, a method was sought to identify factorsspecific to a particular heterologous protein from a pool of likelycandidates.

Using Systems Biology tools, the P. fluorescens genome was mined toidentify host cell protein folding modulator and protease genes. Then,global gene expression analyses were performed to prioritize upregulatedtargets, and, thereafter, novel protein production strains wereconstructed. As a result, a “Pfenex Strain Array” was assembledconsisting of a plurality of phenotypically distinct P. fluorescens hoststrains that are deficient in host-cell proteases or allow theco-overexpression of protein folding modulators. This strain array canbe used to screen for factors that specifically enhance the yield orquality of certain heterologous proteins. Providing a plurality ofphenotypically distinct host strains increases the chance of success ofidentifying a host strain that will increase the production of anyindividual heterologous protein of interest.

This invention provides an improvement in the production of heterologousproteins in Pseudomonas fluorescens. Having available a library of hoststrains in the same genetic background allows the rapid screening andidentification of factors that increase the yield and/or quality ofheterologously expressed proteins. The genome sequence of P. fluorescenshas been annotated and targeted host cell folding modulators andproteases have been identified. Folding modulators assist in the properfolding of proteins and include chaperones, chaperonins,peptidyl-proline isomerases (PPlases), and disulfide bond formationproteins. Proteases can degrade the protein of interest and thus affectheterologous protein yield and quality. Using background knowledge fromthe literature and DNA microarray analyses to identify likely targets, alist of about 80 target genes was assembled. In host cells that have thesame genetic background, these genes were either removed from the genomeor cloned into plasmids to enable co-overexpression along withheterologous proteins. The resulting strains were arrayed in 96-wellformat and, after transformation of plasmids that express theheterologous protein of interest, were screened for improved proteinyield and/or quality.

Example 1 Identification of Folding Modulator Genes in the Genome of P.fluorescens Strain MB214

Folding modulators are a class of proteins present in all cells whichaid in the folding, unfolding and degradation of nascent andheterologous polypeptides. Folding modulators include chaperones,chaperonins, peptidyl-prolyl cis-trans isomerases, and proteins involvedin protein disulfide bond formation. As a first step to construct novelproduction strains with the ability to help fold heterologous proteins,the P. fluorescens genome was mined to identify host cell foldingmodulator genes.

Each of the 6,433 predicted ORFs of the P. fluorescens MB214 genome wasanalyzed for the possibility that they encoded a folding modulator usingthe following method. Several folding modulators of interest had alreadybeen identified by Dow researchers by analysis of the genome annotation(Ramseier et. al. 2001). Homologs of these starting proteins wereidentified using protein/protein BLAST with the starting protein as thequery and a database of all MB214 translated ORFs as the subject. Thosetranslated ORFs which matched the query proteins with significanthomology were added to the list for further analysis. Significanthomology is defined here as having an e-score of 1e-30 or less withallowances made for human judgment based on the length and quality ofthe alignment. The intention of this study was to be very inclusive tomaximize the chance that all potential folding modulators would beidentified.

More ORFs were added to the list based on their curated function fromthe previous annotation containing the keyword “chaperone”. Finally, theORFs were analyzed by the protein signature family searching programInterProScan (Quevillon et. al. 2005) against the InterPro Databaseversion 7.0 (Mulder et. al. 2005). The ORFs were assigned proteinfamilies by the InterProScan software as well as Gene Ontology (GO)categories associated with those families (Gene Ontology Consortium.2004). Using these automatic GO assignments, all of the ORFs which hadbeen assigned the GO terms “GO:0006457 Biological Process: proteinfolding” or “GO:0003754 Molecular Function: chaperone activity” wereadded to the list for further analysis.

The list was then analyzed to remove ORFs which had a low probability ofencoding folding modulators. Again, the intent of this study was to bevery inclusive but many of the ORFs assigned to the list by thesesemi-automated methods could be easily identified as not coding forfolding modulators based on limited criteria and human judgment.

The most common reason for excluding a certain ORF was the weak evidencethat this ORF is actually a folding modulator, i.e. ORFs which had beenassigned to the list based on the previous annotation where thereasoning for annotating the ORF as a folding modulator was eitherunclear or contradictory. InterProScan is actually a conglomerate ofdifferent programs and some of these programs are considered to be morereliable than others. If an ORF was assigned to the list based solely onthe output of the ScanRegExp or ProfileScan components then it wasremoved. The final list of P. fluorescens folding modulators has 43members and is shown in Table 1.

Example 2 Identification of Protease Genes in the Genome of P.fluorescens Strain MB214

Proteases are enzymes that hydrolyze peptide bonds and are necessary forthe survival of all living creatures. However, their role in the cellmeans that proteases can be detrimental to recombinant protein yieldand/or quality in any heterologous protein expression system, which alsoincludes the Pfenex Expression Technology™. As a first step to constructnovel production strains that have protease genes removed from thegenome, the P. fluorescens genome was mined to identify host cellprotease genes.

Each of the 6,433 predicted ORFs of the P. fluorescens MB214 genome wereanalyzed for the possibility that they encoded a protease using thefollowing method. The MEROPS database is manually curated by researchersat the Wellcome Trust Sanger Institute, Cambridge, UK (Rawlings et. al.2006, http://merops.sanger.ac.uk). It is a comprehensive list ofproteases discovered both through laboratory experiments as well as byhomology to known protease families. One of the strengths of thedatabase is the MEROPS hierarchical classification scheme. In thissystem, homologs which share the same function are grouped together intofamilies. Families are grouped into clans based on evolutionaryrelatedness that again are based on similar structural characteristics.The method makes great use of the database to identify protease homologswithin the P. fluorescens genome.

Homologs to the MEROPS database were identified using protein/proteinBLAST with each MB214 translated ORF as the query and a database of allof the MEROPS proteins as the subject. Those translated ORFs, whichmatched the query proteins with significant homology, were added to thelist for further analysis. Significant homology in this case is definedhere as having an e-score of 1e⁻⁶⁰ or less with allowances made forhuman judgment based on the length and quality of the alignment. Thisstep yielded 109 potential proteases for the list.

The ORFs were also analyzed by the protein signature family searchingprogram InterProScan (Quevillon et. al. 2005) against the InterProDatabase version 7.0 (Mulder et. al. 2005). The ORFs were assignedprotein families by the InterProScan software as well as Gene Ontology(GO) categories associated with those families (Gene OntologyConsortium. 2004). Using these automatic GO assignments, all of the ORFswhich had been assigned a GO name that contained the strings“peptidase”, “protease” or “proteolysis” were added to the list forfurther analysis. This step yielded an additional 70 potential proteasesthat had not been identified in the previous step.

More ORFs were added to the list based on their curated function fromthe previous annotation (Ramseier et. al. 2001) containing the keywords“peptidase” or “protease”. This step yielded 32 potential proteases thatagain had not been identified in the previous steps.

The list was then analyzed to remove ORFs which had a low probability ofencoding proteases. Again, the intent of this study was to be veryinclusive but many of the ORFs assigned to the list by thesesemi-automated methods could be easily identified as not coding forproteases based on limited criteria and human judgment. The two mostcommon reasons for excluding genes were the weak evidence that a certainORF is actually a protease, or that a particular gene showed greatesthomology with another protein known to be protease homolog but not aprotease itself. The final list of P. fluorescens proteases has 90members and is shown in Table 2.

Example 3 In Silico Cellular Location Prediction of the FoldingModulator and Protease Proteins

One of the strengths of the Pfenex Expression Technology™ is its abilityto control the cellular compartment to which a particular heterologousprotein can be segregated. Thus, the cellular compartments where theidentified host cell folding modulator and protease proteins are locatedwere predicted. To make these predictions, two programs were chosen.PsortB 2.0 combines the results of 12 separate algorithms, which predictthe subcellular location of a given peptide. The majority of thealgorithms rely on detecting homology between the query protein andproteins of known subcellular localization. PsortB also includesalgorithms such as HMMTOP and SignalP, which detect the presence oftransmembrane folding domains or type I secretion signal sequences,respectively, using Hidden Markov Models (HMM). In addition to thePsortB results, SignalP HMM was used to predict the presence of type Isecretion signal sequences. This was necessary because the output ofPsortB can be vague when a signal sequence is detected but no otherspecific information indicating the subcellular location is given. Inthese cases, PsortB indicates that the subcellular localization of theprotein is unknown, because it really could segregate to any one of thecytoplasmic membrane, periplasm, outer membrane or extracellularcompartments. However, it is informative enough to know that the proteinis probably not located in the cytoplasm to make it worth noting that inthe table. Thus, Table 2 lists the results of the PsortB algorithmexcept in cases where that result was unknown. In these cases the resultof SignalP HMM alone is given with “Signal Peptide” indicating that asignal peptide was detected and “Non Secretory” indicating that nosignal peptide was detected.

Example 4 Construction of Plasmids that Enable the Co-Overexpression ofFolding Modulators

Folding modulator genes were cloned into a plasmid derivative of pCN(Nieto et al. 1990), which is compatible with another plasmid thatroutinely is used to express the heterologous protein of interest(Squires et al. 2004; Chew et al. 2005). The construction of amannitol-inducible grpE-dnaKJ-containing plasmid is exemplified. Otherfolding modulators either as a single gene or as multiple genes whenorganized in operons were cloned similarly as outlined below.

Employing genomic DNA isolated from P. fluorescens MB214 (DNeasy;Qiagen, Valencia, Calif.) as a template and primers RC199(5-ATATACTAGTAGGAGGTAACTTATGGCTGACGAACAGACGCA-3′) (SEQ ID NO:1) andRC200 (5′-ATATTCTAGATTACAGGTCGCCGAAGAAGC-3′) (SEQ ID NO:2), thegrpE-dnaKJ genes were amplified using PfuTurbo (Stratagene, La Jolla,Calif.) as per the manufacturer's recommendations. The resulting 4 kbPCR product was digested with SpeI and XbaI (restriction sitesunderlined in the primers above) and ligated into pDOW2236 which is aderivative of pDOW1306-6 (Schneider et al. 2005b) to create pDOW2240containing the grpE-dnaKJ operon under control of the tac promoter.Plasmid pDOW2240 was then digested with SpeI and HindIII and theresulting grpE-dnaKJ-containing 4.0 kb DNA fragment was gel-purifiedusing Qiaquick (Qiagen, Valencia, Calif.) and ligated into pDOW2247,which is a derivative of pCN carrying the P. fluorescensmannitol-regulated promoter (Schneider et al. 2005a), that was alsodigested with SpeI and HindIII. The resulting plasmid, pDOW3501,contained the grpE-dnaKJ operon under the control of the mannitolpromoter. Plasmid pDOW3501 was then transformed into DC388 and otheruracil-auxotrophic strains by selecting on M9 glucose platessupplemented with 250 ug/ml uracil.

Example 5 Construction of P. fluorescens Strains with Genomic Deletionsof Protease Genes

Plasmids that enabled the creation of genomic deletions were constructedby amplification of 500-1000 by DNA fragments both 5′ and 3′ of the geneto be deleted. The resulting 5′ PCR product typically ends with thetranslational initiation codon (ATG or GTG or TGT) of the gene to bedeleted while the 3′ PCR product typically begins with the stop codon(TAA or TGA or TAG) of the gene to be deleted. These two PCR productswere fused together through an additional amplification step then clonedinto pDOW1261 (FIG. 1) (Chew et al. 2005) using SOE PCR (Horton et al.1990).

Example 6 High-Through-Put Growth and Analysis of Heterologous ProteinExpression in P. fluorescens Strains: Monoclonal Antibody

Plasmid pDOW2787 encodes the monoclonal antibody (mAb) gal2; the heavychain is expressed with a Pbp secretion leader and under control of thetac promoter. The light chain is expressed with an OprF secretion leaderand under control of the mannitol promoter. The plasmid waselectroporated into competent cells of 63 strains carrying either adirected gene deletion or pDOW2247 carrying a folding modulator forco-expression, and five control strains containing a wild type strain.Cells were cultured in replicate deep-well blocks containing growthmedium with glycerol by shaking at 300 rpm. Protein expression wasinduced at 24 hrs with 0.1 mM isopropyl β-D-thiogalactopyranoside (IPTG)and 1% mannitol. At 24 hrs post-induction, aliquots were lysed,antigen-binding of the antigen was measured to quantitate amounts ofactive antibody. The value was divided by OD₆₀₀ to measure cell specificactivity. Strains Δprc1, ΔdegP2, ΔLa2, ΔclpP, and Δprc2, Δprc2, thegrpEdnaKJ co-expression strain, Δtig, ΔclpX, and Alon were all 2.4-foldor more higher than the control strains, which was statisticallysignificant (p<0.5). Soluble cells fractions were prepared from Aprc1,ΔdegP2, ΔLa2 and the grpEdnaKJ co-expression strain and subjected toWestern analysis (FIG. 2). A band with a size consistent with fullyassembled antibody was detected in the four test strains, but not in thecontrol.

Example 7 High Throughput Evaluation of Protein Expression in E. coliand P. fluorescens

Construction of C-Terminal his-Tag Expression Clones

Seven open reading frames (ORFs) were amplified for ligation into theNheI-XhoI sites of the periplasmic vector pDOW3718: Map2K3, ApoAI, hGH,gal2 scFV, gal13 scFv, EPO, and IL2. Primers were designed with a NheIrestriction site on the 5′ primer and a XhoI restriction site on the 3′primer. PCR reactions were performed using Platinum PCR Supermix(Invitrogen cat#1306-016) and PCR products digested with NheI and XhoIin NEBuffer 2 (New England Biolabs), incubating 37° C. overnight, thenpurified using Qiaquick Extraction kit (Qiagen). The digested productswere then ligated to NheI-XhoI digested pDOW3718 using T4 DNA ligase(NEB). Ligation products were transformed into electrocompetent P.fluorescens DC454 and transformants were selected on LB agarsupplemented with 250 μg/mL uracil and 30 μg/mL tetracycline.

The same seven ORFs were also amplified and prepared for ligation intothe NcoI-XhoI sites of pET22b (Novagen) for expression in E. coli.Primers were designed with an NcoI restriction site on the 5′ primer,and XhoI restriction site (HindIII for MAP2K3 and SalI for ApoAI) on the3′ primer. PCR reactions and restriction digestion were performed asdescribed above with the exception that restriction enzymes NcoI,HindIII, SalI and XhoI were used as required. The digested products wereligated to NcoI-XhoI digested pET22b using T4 DNA ligase (NEB), and theligation products were transformed into chemically competent E. coli Top10 cells. Transformants were selected in LB agar ampicillin plates(Teknova). Plasmid DNA was prepared (Qiagen) and screened for insert byPCR using T7 promoter and T7 terminator primers. Positive clones weresequenced to confirm insert sequence. One confirmed cloned plasmid foreach was subsequently transformed into BL21(DE3) (Invitrogen) forexpression analysis.

High Throughput Expression Analysis

The P. fluorescens strains were grown using a high throughput expressionprotocol. Briefly, seed cultures, grown in LB medium supplemented with250 ug/mL uracil and 15 mg/mL tetracycline, were used to inoculate 0.5mL of defined minimal salts medium without yeast extract (Teknova3H1130) supplemented with 250 ug/mL uracil and 15 mg/mL, tetracyclineand 5% glycerol as the carbon source in a 2.0 mL deep 96-well microtiterplate. Following an initial growth phase at 30° C. (24 hours),expression via the Ptac promoter was induced with 0.3 mMisopropyl-β-D-1-thiogalactopyranoside (IPTG).

The E. coli strains were grown in a 2.0 mL deep 96-well plate usingOvernight Express™ autoinduction medium (Novagen). Briefly, seedcultures grown in LB medium supplemented with 100 μg/mL ampicillin(LBAmp) were used to inoculate 0.5 mL of LBAmp+Overnight Express™prepared according to the manufacturer's protocol. The cultures wereallowed to grow for 24 hours.

Cultures were sampled at the time of induction (I0), and at 24 hourspost induction (I24). Cell density was measured by optical density at600 nm (OD₆₀₀), and 25 μL of whole broth was removed at 124 and storedat −20° C. for later processing. The remainder of the culture (˜400 μL)was transferred to Eppendorf tubes and centrifuged 20,000×g for 2minutes. The cell free broth fractions were removed to a 96-well plateand stored at −20° C. as were the cell pellets.

SDS-PAGE and Western Analyses

Soluble and insoluble fractions from culture samples were generatedusing Easy Lyse™ (Epicentre Technologies cat#RP03750). The 250 μL wholebroth sample was lysed by adding 175 mL of Easy Lyse™ buffer, incubatingwith gentle rocking at room temperature for 30 minutes. The lysate wascentrifuged at 14,000 rpm for 20 minutes (4° C.) and the supernatantremoved. The supernatant was saved as the soluble fraction. The pellet(insoluble fraction) was then resuspended in an equal volume of lysisbuffer and resuspended by pipetting up and down. For selected clones,cell free broth samples were thawed and analyzed without dilution.Samples were mixed 1:1 with 2× Laemmli sample buffer containingβ-mercaptoethanol (BioRad cat# 161-0737) and boiled for 5 minutes priorto loading 204 on a Bio-Rad Criterion 4-12% Criterion XT gel (BioRadcat# 345-0124) and electrophoresis in 1×MES buffer (cat.# 161-0789).Gels were stained with Simply Blue Safe Stain (Invitrogen cat# LC6060)according to the manufacturer's protocol and imaged using the AlphaInnotech Imaging system.

Soluble and insoluble fractions prepared and separated by SDS-PAGE asdescribed above were transferred to nitrocellulose (BioRad cat#162-0232) using lx transfer buffer (Invitrogen cat# NP0006) preparedaccording to manufacturer's protocol, for 1.5-2 hours at 100 V. Aftertransfer, the blot was washed briefly in 1×PBS and then blockedovernight in Blocker Casein in PBS (Pierce cat# 37528) at 4° C. Thediluent was poured off and more diluent was added containing a 1:5,000dilution of anti-histidine-HRP antibody. The blots were incubated 2hours at room temperature. The diluent/antibody solution was then pouredoff and the blots washed in lx PBST (Sigma #P-3563) with vigorousshaking for 5 minutes. The PBST was changed and washing was repeatedtwice. For development, the blots were removed from the PBST solutionand immersed in prepared solution using the Immunopure Metal EnhancedDAB Substrate Kit (Pierce cat#34065). The blots were incubated withgentle shaking for 10 minutes and then removed from the solution andallowed to dry on paper. The blots were imaged, and densitometry wasperformed using an Alpha Innotech FluorImager.

HTP Expression Analysis of E. coli and P. fluorescens RecombinantStrains

P. fluorescens and E. coli strains were grown in 0.5 mL cultures in a 96well format to evaluate expression of a variety of human proteins aswell as 2 single chain antibodies. Each protein was cloned into the P.fluorescens periplasmic expression vector pDOW3718, and the E. coliperiplasmic expression vector pET22B in frame with a C-terminal 6×histidine tag. P. fluorescens cultures were grown in Dow's standard highthroughput medium, and E. coli cultures were grown in the autoinductionmedium Overnight Express™. Growth of P. fluorescens expression strainswas observed to reach A₆₀₀ units of 20-25 at the time of induction and˜25-45 post induction.

FIG. 3 shows growth curves for P. fluorescens (filled symbols) and E.coli (open symbols) expression clones. Elapsed fermentation time inhours is shown on the X-axis and optical density measured at 600 nm(A₆₀₀) is shown on the Y-axis. The arrow indicates time of induction ofP. fluorescens cultures.

The E. coli constructs reached an A₆₀₀ of ˜5-10 units at the time ofharvest, with the exception of 1 strain, which reached an A₆₀₀ of ˜25units after 24 hours of growth in autoinduction medium (FIG. 3, opencircle). SDS-PAGE and Western analyses (FIG. 4) showed expression ofrecombinant protein for P. fluorescens in all cases tested, and for E.coli in all but one case tested (Epo). Differences in expression levelsand solubility between strains were readily detectable. P. fluorescensshowed an advantage in solubility for MAP2K3, Gal2 and Gal13 scFvs (1),hGH and IL2. Moreover, an advantage in secretion leader processing in P.fluorescens was observed for ApoAI, Gal2 scFv, hGH and I12. In E. coli,the pelB leader appeared to be unprocessed from these proteins bySDS-PAGE and Western analyses.

Example 8 High-through-put Growth and Analysis of Heterologous ProteinExpression in P. fluorescens Strains: Increasing Expression ofInterferon Alpha 2a Construction of Protein Expression Plasmids

Standard cloning methods are used in the construction of plasmids thatoverexpress Interferon alpha 2a. The fragment containing the codingsequences is subcloned into 16 different expression vectors. Theexpression vectors each contain a periplasmic secretion leader, as shownin Table 11.

TABLE 11 Expression vectors Expression Secretion Vector Leader pDOW5201Pbp pDOW5206 DsbA pDOW5209 Azu pDOW5217 LAO pDOW5220 Ibp S31A pDOW5223TolB pDOW5226 Trc pDOW5235 FlgI pDOW5238 CupC2 pDOW5241 CupB2 pDOW5244CupA2 pDOW5247 nikA pDOW5256 PorE pDOW5259 pbpA20V pDOW5262 DsbCpDOW5265 Bce

For the subcloning, a plasmid containing the coding sequence for aheterologous protein to be overexpressed is digested with appropriaterestriction enzymes. The expression vectors are digested with the samerestriction enzymes. The insert and vector DNA is ligated overnight withT4 DNA ligase (New England Biolabs; MO202S), then electroporated intocompetent P. fluorescens DC454 cells. The transformants are plated on M9glucose agar (Teknova) and screened for an insert by PCR. Positiveclones are selected and sequence-confirmed on both strands. Eachsequence-confirmed plasmid is transformed into selected P. fluorescenshost strains in a 96-well format, to obtain an expression systemcomprising the host strain and an expression vector. Expression of thecoding gene is driven by an appropriate promoter, e.g., Ptac.

Transformation into the P. fluorescens host strains DC485 and DC487 isperformed as follows: twenty five microliters of competent cells arethawed and transferred into a 96-well electroporation plate (BTX ECM630Electroporator), and 1 μl miniprep plasmid DNA is added to each well.Cells are electroporated and subsequently resuspended in 75 μl HTP mediawith trace minerals, transferred to 96-well deep well plate with 400 μlM9 salts 1% glucose medium, and incubated at 30° C. with shaking for 48hours.

Growth and Expression in 96-Well Format

Expression of the recombinant protein is evaluated under standardinduction conditions at the HTP 96-well plate scale. The expressionsystems, each containing one of each of the 16 expression constructs,are grown in triplicate and expression from the heterologous genepromoter is induced. A standard expression system, e.g., DC454transformed with one of the heterologous protein expression vectorsused, is included on the array. A null strain comprising DC432 nullstrain containing a vector without an expression insert is alsoincluded.

Ten microliters of seed culture is transferred into triplicate 96-welldeep well plates, each well containing 500 μl of HTP-YE medium, andincubated as before for 24 hours. Isopropyl βD-1 thiogalactopyranoside(IPTG) is added to each well for a final concentration of 0.3 mM toinduce recombinant protein expression, 1% mannitol is used to induceexpression of genes (e.g., encoding folding modulators or proteaseshaving potential chaperone activity) present on secondary expressionvectors, and the temperature is reduced to 25° C. Twenty four hoursafter induction, cells are normalized to OD₆₀₀=20. Samples can benormalized in phosphate buffered saline, pH 7.4 to a final volume of4000 μL, in cluster tubes, e.g., using the Biomek FX liquid-handlingsystem (Beckman Coulter), and frozen at −80° C. for later processing.

Sample Preparation and SDS-CGE

Soluble and insoluble fractions are prepared from the cultures bysonication followed by centrifugation. Frozen, normalized culture broth(200-400 μL) is sonicated for 10 minutes. The lysates are centrifuged at14,000 rpm for 20 minutes (4° C.) and the supernatants removed by pipet(soluble fraction). The pellets are then resuspended in 2004 ofphosphate buffered saline (PBS), pH 7.4. Insoluble samples are preparedfor SDS capillary gel electrophoresis (CGE) (Caliper Life Sciences,Protein Express LabChip Kit, Part 760301), in the presence ofdithiothreitol (DTT).

An overview of growth before induction and 24 hours after induction areanalyzed by the statistical analysis software JMP. The mean OD₆₀₀ foreach expression strain after an initial 24-hour growth period andfollowing the 24 hour induction period are determined.

Soluble and insoluble fractions are analyzed by SDS-CGE to assessexpression of the recombinant protein. Strains showing signal abovebackground (e.g., expression from the DC432 null strain) correspondingto induced, soluble protein are noted. Soluble recombinant proteinexpression and insoluble protein expression are observed. Based oncomparison of total and soluble protein yield to those in an indicatorstrain, expression systems representing a diversity of expressionstrategies are selected to evaluate at fermentation scale, and anoptimal expression system is selected for overexpression of theInterferon alpha 2a.

Example 9 High-Through-Put Growth and Analysis of Heterologous ProteinExpression in P. fluorescens Strains: Overexpression of a Protein inTable 9

Using a method similar to that described in Example 8, the codingsequence for a heterologous protein listed in Table 9 is cloned into aseries of P. fluorescens expression vectors. The insert is confirmed bysequencing, and the vectors transformed into P. fluorescens host cellpopulations. The resulting expression strains are grown and proteinexpression is induced, in a 96-well format. The cultures are evaluatedfor heterologous protein yield. At least one optimal expression systemis selected for overexpression based on the yields observed.

REFERENCES

-   Chew, L. C., T. M. Ramseier, D. M. Retallack, J. C. Schneider, C. H.    Squires and H. W. Talbot (2005). Pseudomonas fluorescens. Production    of Recombinant Proteins. Novel Microbial and Eucaryotic Expression    Systems. G. Gellissen. Weinheim, WILEY-VCH: 45-66-   Dolinski, K, Heitman, J. 1997. Peptidyl-prolyl isomerases—an    overview of the cyclophilin, FKBP and parvulin families, in    Guidebook to Molecular Chaperones and Protein-Folding Catalysts.    Gething M-J Ed. Oxford University Press Inc., New York: 359-369-   Gardy, J. L., M. R. Laird, F. Chen, S. Rey, C. J. Walsh, M. Ester,    and F. S. L. Brinkman 2005 PSORTb v.2.0: expanded prediction of    bacterial protein subcellular localization and insights gained from    comparative proteome analysis. Bioinformatics 21(5):617-623.-   Gene Ontology Consortium. 2004. The Gene Ontology (GO) database and    informatics resource. Nucleic Acids Research 32:D258-D261.-   Gething M-J Ed. 1997. Guidebook to Molecular Chaperones and    Protein-Folding Catalysts. Oxford University Press Inc., New York.-   Horton, R. M., Z. Cai, S, N. Ho and L. R. Pease (1990). “Gene    splicing by overlap extension: tailor-made genes using the    polymerase chain reaction.” BioTechniques 8(5): 528-30, 532, 534-5-   Lombardo, M-J, Thanassi, D G, Hultgren, S J. 1997. Escherichia coli    PapD. in Guidebook to Molecular Chaperones and Protein-Folding    Catalysts. Gething M-J Ed. Oxford University Press Inc., New York:    463-465-   Mulder N J, Apweiler R, Attwood T K, Bairoch A, Bateman A, Binns D,    Bradley P, Bork P, Bucher P, Cerutti L, Copley R, Courcelle E, Das    U, Durbin R, Fleischmann W, Gough J, Haft D, Harte N, Hulo N, Kahn    D, Kanapin A, Krestyaminova M, Lonsdale D, Lopez R, Letunic I,    Madera M, Maslen J, McDowall J, Mitchell A, Nikolskaya A N, Orchard    S, Pagni M, Ponting C P, Quevillon E, Selengut J, Sigrist C J,    Silventoinen V, Studholme D J, Vaughan R, Wu C H. 2005. InterPro,    Progress and Status in 2005. Nucleic Acids Res. 33, Database    Issue:D201-5.-   Nieto, C., E. Fernandez-Tresguerres, N. Sanchez, M. Vicente and R.    Diaz (1990). “Cloning vectors, derived from a naturally occurring    plasmid of Pseudomonas savastanoi, specifically tailored for genetic    manipulations in Pseudomonas.” Gene 87(1): 145-9.-   Quevillon E., Silventoinen V., Pillai S., Harte N., Mulder N.,    Apweiler R., Lopez R. (2005) InterProScan: protein domains    identifier. Nucleic Acids Research 33: W116-W120.-   Ramseier T M, S. C., Payne J, Chew L, Rothman L D,    Subramanian M. 2001. The Pseudomonas fluorescens MB214 Genome    Sequence. CRI CRI2001001442; BIOTECH 01-007. The Dow Chemical    Company.-   Ranson, N A, White, H E, Saibil, H R. 1998. Chaperonins Biochem. J.    333, 233-242.-   Rawlings, N. D., Morton, F. R. & Barrett, A. J. 2006. MEROPS: the    peptidase database. Nucleic Acids Res 34, D270-D272.-   Schneider, J. C., A. F. Jenings, D. M. Mun, P. M. McGovern and L. C.    Chew (2005a). “Auxotrophic markers pyrF and proC can replace    antibiotic markers on protein production plasmids in    high-cell-density Pseudomonas fluorescens fermentation.”    Biotechnology Progress 21(2): 343-348.-   Schneider, J. C., B. Rosner and A. Rubio (2005b). Mannitol Induced    Promoter Systems in Bacterial Host Cells. USA, The Dow Chemical    Company.-   Squires, C. H., D. M. Retallack, L. C. Chew, T. M. Ramseier, J. C.    Schneider and H. W. Talbot (2004). “Heterologous protein production    in P. fluorescens.” BioProcess International 2(11): 54-56, 58-59.-   Graslund, S. et al. Protein production and purification, Nature    Methods 5:135-146 (2008)-   Berrow, N. S. et al. Recombinant protein expression and solubility    screening in Escherichia coli: a comparative study. Biological    Crystallography. 62: 1218-1226 (2006).

Gillette, W. K. et al. Pooled ORF Expression Technology (POET),Molecular and Cellular Proteomics, 4: 1657-1652 (2005).

-   Service, R. F. Tapping DNA for structures produces a trickle,    Science 298:948-950 (2002).-   Bussow, K. et al. Structural genomics of human proteins-target    selection and generation of a public catalogue of expression clones,    Microbial Cell Factories. 4:21-34 (2005).-   Abdullah, J. M., A. Joachimiak, and F. R. Collart. 2009. “System 48”    high-throughput cloning and protein expression analysis. Methods Mol    Biol 498:117-27.-   Aricescu, A. R., R. Assenberg, R. M. Bill, D. Busso, V. T.    Chang, S. J. Davis, A. Dubrovsky, L. Gustafsson, K. Hedfalk, U.    Heinemann, I. M. Jones, D. Ksiazek, C. Lang, K. Maskos, A.    Messerschmidt, S. Macieira, Y. Peleg, A. Perrakis, A. Poterszman, G.

Schneider, T. K. Sixma, J. L. Sussman, G. Sutton, N. Tarboureich, T.Zeev-Ben-Mordehai, and E. Y. Jones. 2006. Eukaryotic expression:developments for structural proteomics. Acta Crystallogr D BiolCrystallogr 62:1114-24.

-   Aricescu, A. R., W. Lu, and E. Y. Jones. 2006. A time- and    cost-efficient system for high-level protein production in mammalian    cells. Acta Crystallogr D Biol Crystallogr 62:1243-50.-   Bahia, D., R. Cheung, M. Buchs, S. Geisse, and I. Hunt. 2005.    Optimisation of insect cell growth in deep-well blocks: development    of a high-throughput insect cell expression screen. Protein Expr    Purif 39:61-70.-   Boettner, M., B. Prinz, C. Holz, U. Stahl, and C. Lang. 2002.    High-throughput screening for expression of heterologous proteins in    the yeast Pichia pastoris. J Biotechnol 99:51-62.-   Damasceno, L. M., K. A. Anderson, G. Ritter, J. M. Cregg, L. J. Old,    and C. A. Batt. 2007. Cooverexpression of chaperones for enhanced    secretion of a single-chain antibody fragment in Pichia pastoris.    Appl Microbiol Biotechnol 74:381-9.-   Emond, S., G. Potocki-Veronese, P. Mondon, K. Bouayadi, H.    Kharrat, P. Monsan, and M. Remaud-Simeon. 2007. Optimized and    automated protocols for high-throughput screening of amylosucrase    libraries. J Biomol Screen 12:715-23.-   Gonzalez Barrios, A. F., R. Zuo, Y. Hashimoto, L. Yang, W. E.    Bentley, and T. K. Wood. 2006. Autoinducer 2 controls biofilm    formation in Escherichia coli through a novel motility    quorum-sensing regulator (MqsR, B3022). J Bacteriol 188:305-16.-   Holz, C., O. Hesse, N. Bolotina, U. Stahl, and C. Lang. 2002. A    micro-scale process for high-throughput expression of cDNAs in the    yeast Saccharomyces cerevisiae. Protein Expr Purif 25:372-8.-   Hsu, T. A., J. J. Eiden, and M. J. Betenbaugh. 1994. Engineering the    assembly pathway of the baculovirus-insect cell expression system.    Ann N Y Acad Sci 721:208-17.-   Jarvis, D. L., M. D. Summers, A. Garcia, Jr., and D. A.    Bohlmeyer. 1993. Influence of different signal peptides and    prosequences on expression and secretion of human tissue plasminogen    activator in the baculovirus system. J Biol Chem 268:16754-62.-   Larsen, M. W., U. T. Bornscheuer, and K. Hult. 2008. Expression of    Candida antarctica lipase B in Pichia pastoris and various    Escherichia coli systems. Protein Expr Purif 62:90-7.-   Novak, M., T. Pfeiffer, M. Ackermann, and S. Bonhoeffer. 2009.    Bacterial growth properties at low optical densities. Antonie Van    Leeuwenhoek.-   Vad, R., E. Nafstad, L. A. Dahl, and O, S. Gabrielsen. 2005.    Engineering of a Pichia pastoris expression system for secretion of    high amounts of intact human parathyroid hormone. J Biotechnol    116:251-60.-   Zhang, W., H. L. Zhao, C. Xue, X. H. Xiong, X. Q. Yao, X. Y.    Li, H. P. Chen, and Z. M. Liu. 2006. Enhanced secretion of    heterologous proteins in Pichia pastoris following overexpression of    Saccharomyces cerevisiae chaperone proteins. Biotechnol Prog    22:1090-5.

Table of SEQ ID NOS: PROTEIN FOLDING MODULATOR PROTEASE (RXF#) SEQ IDNO: (RXF#) SEQ ID NO: LEADER/RELATED SEQUENCE SEQ ID NO: RXF02095.1 3RXF00133.1 46 pbp mutant leader - DNA sequence RXF06767.1 4 RXF00285.247 pbp mutant leader - amino acid sequence RXF01748.1 5 RXF00325.1 48dsbA leader - DNA sequence RXF03385.1 6 RXF00428.1 49 dsbA leader -amino acid sequence RXF05399.1 7 RXF00449.1 50 dsbC leader - DNAsequence RXF06954.1 8 RXF00458.2 51 dsbC leader - amino acid sequenceRXF03376.1 9 RXF00561.2 52 Bce leader - DNA sequence RXF03987.2 10RXF00670.1 53 Bce leader - amino acid sequence RXF05406.2 11 RXF00811.154 CupA2 leader - DNA sequence RXF03346.2 12 RXF01037.1 55 CupA2leader - amino acid sequence RXF05413.1 13 RXF01181.1 56 CupB2 leader -DNA sequence RXF04587.1 14 RXF01250.2 57 CupB2 leader - amino acidsequence RXF08347.1 15 RXF01291.2 58 CupC2 leader - DNA sequenceRXF04654.2 16 RXF01418.1 59 CupC2 leader - amino acid sequenceRXF04663.1 17 RXF01590.2 60 NikA leader - DNA sequence RXF01957.2 18RXF01816.1 61 NikA leader - amino acid sequence RXF01961.2 19 RXF01822.262 FlgI leader - DNA sequence RXF04254.2 20 RXF01918.1 63 FlgI leader -amino acid sequence RXF05455.2 21 RXF01919.1 64 ORF5550 leader - DNAsequence RXF02231.1 22 RXF01961.2 65 ORF5550 leader - amino acidsequence RXF07017.2 23 RXF01968.1 66 Ttg2C leader - DNA sequenceRXF08657.2 24 RXF02003.2 67 Ttg2C leader - amino acid sequenceRXF01002.1 25 RXF02151.2 68 ORF8124 leader - DNA sequence RXF03307.1 26RXF02161.1 69 ORF8124 leader - amino acid sequence RXF04890.2 27RXF02342.1 70 oligonucleotide primer RXF03768.1 28 RXF02492.1 71oligonucleotide primer RXF05345.2 29 RXF02689.2 72 First 5 amino acidsof the predicted protein sequence for the processed form of dsbC-SkpRXF06034.2 30 RXF02739.1 73 First 10 amino acids of the predictedprotein sequence for the unprocessed form of dsbC-Skp RXF06591.1 31RXF02796.1 74 First 10 amino acids of the predicted protein sequence forthe processed form of dsbC-Skp RXF05753.2 32 RXF02980.1 75 porin E1precursor leader - DNA sequence RXF01833.2 33 RXF03065.2 76 porin E1precursor leader - amino acid sequence RXF04655.2 34 RXF03329.2 77 Outermembrane porin F leader - DNA sequence RXF05385.1 35 RXF03364.1 78 Outermembrane porin F leader - amino acid sequence RXF00271.1 36 RXF03397.179 Periplasmic phosphate binding protein (pbp) leader - DNA sequenceRXF06068.1 37 RXF03441.1 80 Periplasmic phosphate binding protein (pbp)leader - amino acid sequence RXF05719.1 38 RXF03488.2 81 azurin leader -DNA sequence RXF03406.2 39 RXF03699.2 82 azurin leader - amino acidsequence RXF04296.1 40 RXF03916.1 83 rare lipoprotein B precursorleader - DNA sequence RXF04553.1 41 RXF04047.2 84 rare lipoprotein Bprecursor leader - amino acid sequence RXF04554.2 42 RXF04052.2 85Lysine-arginine-ornithine- binding protein leader - DNA sequenceRXF05310.2 43 RXF04304.1 86 Lysine-arginine-ornithine- binding proteinleader - amino acid sequence RXF05304.1 44 RXF04424.2 87 Iron(III)binding protein leader - DNA sequence RXF05073.1 45 RXF04495.2 88Iron(III) binding protein leader - amino acid sequence RXF02090 137RXF04500.1 89 N-terminal amino acid sequence of processed azurin and ibpRXF01181.1 138 RXF04567.1 90 CDS-1 DNA sequence RXF03364.1 139RXF04631.2 91 CDS-1 amino acid sequence RXF03376.1 140 RXF04653.2 92TrxA DNA sequence RXF04693.1 141 RXF04657.2 93 TrxA amino acid sequencesRXF05319.1 142 RXF04663.1 94 TolB leader - DNA sequence RXF05445.1 143RXF04692.1 95 TolB leader - amino acid sequence RXF08122.2 144RXF04693.1 96 RXF06751.1 145 RXF04715.1 97 RXF00922.1 146 RXF04802.1 98RXF03204.1 147 RXF04808.2 99 RXF04886.2 148 RXF04920.1 100 RXF05426.1149 RXF04923.1 101 RXF05432.1 150 RXF04960.2 102 RXF04968.2 103RXF04971.2 104 RXF05081.1 105 RXF05113.2 106 RXF05137.1 107 RXF05236.1108 RXF05379.1 109 RXF05383.2 110 RXF05400.2 111 RXF05615.1 112RXF05817.1 113 RXF05943.1 114 RXF06281.1 115 RXF06308.2 116 RXF06399.2117 RXF06451.1 118 RXF06564.1 119 RXF06586.1 120 RXF06755.2 121RXF06993.2 122 RXF07170.1 123 RXF07210.1 124 RXF07879.1 125 RXF08136.2126 RXF08517.1 127 RXF08627.2 128 RXF08653.1 129 RXF08773.1 130RXF08978.1 131 RXF09091.1 132 RXF09147.2 133 RXF09487.1 134 RXF09831.2135 RXF04892.1 136 RXF00458.2 151 RXF01957.2 152 RXF04497.2 153RXF04587.1 154 RXF04654.2 155 RXF04892.1 156 XFRNA203 157

All publications and patent applications mentioned in the specificationare indicative of the level of skill of those skilled in the art towhich this invention pertains. All publications and patent applicationsare herein incorporated by reference to the same extent as if eachindividual publication or patent application was specifically andindividually indicated to be incorporated by reference.

Although the foregoing invention has been described in some detail byway of illustration and example for purposes of clarity ofunderstanding, it will be obvious that certain changes and modificationsmay be practiced within the scope of the appended claims.

1. A method of assembling an array of expression systems for testingexpression of at least one heterologous protein, said method comprising:placing in separate addressable locations at least 10 nonidentical testexpression systems, said at least 10 nonidentical test expressionsystems each comprising a different combination of a) a Pseudomonad orE. coli host cell population, and b) at least one expression vectorencoding the at least one heterologous protein, wherein the arrayincludes at least 5 different host cell populations and at least 2different expression vectors, and further wherein at least 3 of said atleast 5 different host cell populations are deficient in theirexpression of at least one protease; and wherein at least one of thenonidentical test expression systems overexpresses the at least oneheterologous protein.
 2. The method of claim 1, wherein the at least 2different expression vectors each encode a different heterologousprotein.
 3. The method of claim 2, wherein the array includes at least 5different expression vectors, and wherein each of said at least 5different expression vectors encodes a different heterologous protein.4. The method of claim 1, wherein at least one expression vector encodes2 different heterologous proteins.
 5. The method of claim 1, wherein atleast 20 nonidentical test expression systems are placed in separateaddressable locations, and wherein the array includes at least 10different host cell populations and at least 2 different expressionvectors, and further wherein at least 5 of said at least 10 differenthost cell populations are deficient in their expression of at least oneprotease.
 6. The method of claim 1, wherein at least 50 nonidenticaltest expression systems are placed in separate addressable locations,and wherein the array includes at least 20 different host cellpopulations and at least 3 different expression vectors, and furtherwherein at least 10 of said at least 20 different host cell populationsare deficient in their expression of at least one protease.
 7. Themethod of claim 1 wherein the overexpression of the heterologous proteinin the at least one nonidentical test expression system is an increasein yield of the heterologous protein, of about 1.5-fold to about100-fold, relative to the yield in an indicator expression system. 8.The method of claim 1 wherein the overexpression is a yield of theheterologous protein in the at least one nonidentical test expressionsystem of about 10 mg/liter to about 2000 mg/liter.
 9. The method ofclaim 7 wherein the increase in yield is about 1.5-fold to about 2-fold,about 2-fold to about 3-fold, about 3-fold to about 4-fold, about 4-foldto about 5-fold, about 5-fold to about 6 fold, about 6-fold to about7-fold, about 7-fold to about 8-fold, about 8-fold to about 9-fold,about 9-fold to about 10-fold, about 10-fold to about 15-fold, about15-fold to about 20-fold, about 20-fold to about 25-fold, about 25-foldto about 30-fold, about 30-fold to about 35-fold, about 35-fold to about40-fold, about 45-fold to about 50-fold, about 50-fold to about 55-fold,about 55-fold to about 60-fold, about 60-fold to about 65-fold, about65-fold to about 70-fold, about 70-fold to about 75-fold, about 75-foldto about 80-fold, about 80-fold to about 85-fold, about 85-fold to about90-fold, about 90-fold to about 95-fold, or about 95-fold to about100-fold.
 10. The method of claim 8 wherein the yield of theheterologous protein is about 10 mg/liter to about 20 mg/liter, about 20mg/liter to about 50 mg/liter, about 50 mg/liter to about 100 mg/liter,about 100 mg/liter to about 200 mg/liter, about 200 mg/liter to about300 mg/liter, about 300 mg/liter to about 400 mg/liter, about 400mg/liter to about 500 mg/liter, about 500 mg/liter to about 600mg/liter, about 600 mg/liter to about 700 mg/liter, about 700 mg/literto about 800 mg/liter, about 800 mg/liter to about 900 mg/liter, about900 mg/liter to about 1000 mg/liter, about 1000 mg/liter to about 1500mg/liter, or about 1500 mg/liter to about 2000 mg/liter.
 11. The methodof claim 7 wherein the indicator expression system comprises a secondnonidentical test expression system in the array, or a standardexpression system.
 12. The method of claim 7, wherein the yield of theheterologous protein is a measure of the amount of soluble heterologousprotein, the amount of recoverable heterologous protein, the amount ofproperly processed heterologous protein, the amount of properly foldedheterologous protein, the amount of active heterologous protein, and/orthe total amount of heterologous protein.
 13. The method of claim 7,further comprising selecting an optimal expression system from among thetest expression systems based on the increased yield of the heterologousprotein in the test expression system relative to that in the indicatorexpression system.
 14. The method of claim 8, further comprisingselecting an optimal expression system from among the test expressionsystems based on the yield of the heterologous protein in the testexpression system.
 15. A method for selecting an optimal expressionsystem comprising using the array assembled using the method of claim 1.16. An array assembled using the method of claim
 1. 17. The method ofclaim 1, wherein at least 2 of said at least 5 different expressionsystems overexpress at least one folding modulator.
 18. The method ofclaim 17, wherein the at least one folding modulator is selected fromthe folding modulators listed in Table 1 and Table
 2. 19. The method ofclaim 17, wherein the at least one folding modulator is expressed from aplasmid.
 20. The method of claim 1, wherein at least one host cellpopulation is defective in at least one to about eight proteases. 21.The method of claim 20, wherein the at least one to about eightproteases are selected from the proteases listed in Table 1 and Table 2.22. The method of claim 1, further comprising determining the number ofcysteine residues in, the presence of clustered prolines in, therequirement of an N terminal methionine for activity of, or the presenceof a small amino acid in the plus two position of, the heterologousprotein.
 23. The method of claim 22, wherein when the heterologousprotein has more than two cysteine residues, at least one of said atleast 2 different expression systems overexpressing a folding modulatoroverexpresses a disulfide isomerase/oxidoreductase.
 24. The method ofclaim 22, wherein the disulfide isomerase/oxidoreductase is encoded on aplasmid.
 25. The method of claim 22, wherein when the heterologousprotein has more than four cysteine residues, at least one of said atleast 2 different expression vectors encoding the heterologous proteincontains a periplasmic secretion leader sequence.
 26. The method ofclaim 22, wherein when the heterologous protein has more than fourcysteine residues, at least one of said at least 2 different expressionvectors encoding the heterologous protein contains a high or mediumribosome binding sequence.
 27. The method of claim 25, further whereinsaid at least one of said at least 2 different expression vectorsencoding the heterologous protein and containing a periplasmic secretionleader sequence is included in at least one expression system thatoverexpresses at least one periplasmic chaperone, and at least oneexpression system that overexpresses at least one cytoplasmic chaperone.28. The method of claim 22, wherein when the heterologous protein hasfewer than four cysteine residues, at least one of said at least 2different expression vectors encoding the heterologous protein does notcontain a periplasmic secretion leader sequence, and further whereinsaid at least one of said at least 2 different expression vectorsencoding the heterologous protein and not containing a periplasmicsecretion leader sequence is included in at least one expression systemthat overexpresses at least one cytoplasmic chaperone.
 29. The method ofclaim 22, wherein when clustered prolines are present, at least oneexpression system that overexpresses at least one 2+ peptidyl-prolylcis-trans isomerase (PPIase) is included in the array.
 30. The method ofclaim 22, wherein the 2+ peptidyl-prolyl cis-trans isomerase (PPIase) isencoded on a plasmid.
 31. The method of claim 22, wherein when theN-terminal methionine is required, at least one expression systemcomprising a host cell population that has at least one defect in atleast one methionyl amino peptidase, is included in the array.
 32. Themethod of claim 22 wherein when a small amino acid is present in theplus two position of the heterologous protein, at least one expressionsystem comprising a host cell population that has at least one defect inat least one amino peptidase, is included in the array.
 33. The methodof claim 22, wherein the small amino acid is selected from the groupconsisting of: glycine, alanine, valine, serine, threonine, asparticacid, asparagine, and proline.
 34. The method of claim 1, wherein theheterologous protein is one of the following: a toxin; a cytokine,growth factor or hormone, or receptor thereof; an antibody or antibodyderivative; a human therapeutic protein or therapeutic enzyme; anon-natural protein or a fusion protein; a chaperone; a pathogen proteinor pathogen-derived antigen; a lipoprotein; a reagent protein; or abiocatalytic enzyme.
 35. The method of claim 34, wherein the toxin is avertebrate or invertebrate animal toxin, a plant toxin, a bacterialtoxin, a fungal toxin, or variant thereof.
 36. (canceled)
 37. (canceled)38. The method of claim 34, wherein the antibody or antibody derivativeis a humanized antibody, modified antibody, nanobody, bispecificantibody, single-chain antibody, Fab, Domain antibody, shark singledomain antibody, camelid single domain antibody, linear antibody,diabody, or BiTE molecule.
 39. (canceled)
 40. (canceled)
 41. (canceled)42. (canceled)
 43. (canceled)
 44. (canceled)
 45. (canceled)
 46. Themethod of claim 1, wherein at least 10% of the heterologous protein isinsoluble when expressed in an indicator strain, or wherein theheterologous protein is predicted to be insoluble using a proteinsolubility prediction tool.
 47. A method for selecting an optimalexpression system for overexpressing at least one heterologous protein,said method comprising: assembling an array by placing in separateaddressable locations at least 10 nonidentical test expression systems,said at least 10 nonidentical test expression systems each comprising adifferent combination of a) a Pseudomonad or E. coli host cellpopulation, and b) at least one expression vector encoding the at leastone heterologous protein wherein the array includes at least 5 differenthost cell populations and at least 2 different expression vectors, andfurther wherein at least 3 of said at least 10 different host cellpopulations are deficient in their expression of at least one protease;measuring the yield of the heterologous protein expressed; and selectingat least one optimal expression system from among the test expressionsystems based on the yield of the heterologous protein measured.
 48. Themethod of claim 47 wherein the yield of the heterologous protein isabout 1.5-fold to an about 100-fold higher in the at least one optimalexpression system relative to that in an indicator expression system.49. The method of claim 47 wherein the yield of the heterologous proteinin the at least one optimal expression system is about 10 mg/liter toabout 2000 mg/liter.
 50. The method of claim 48 wherein the indicatorexpression system comprises a second nonidentical test expression systemin the array or a standard expression system.
 51. The method of claim48, wherein the yield of the heterologous protein is a measure of theamount of soluble heterologous protein, the amount of recoverableheterologous protein, the amount of properly processed heterologousprotein, the amount of properly folded heterologous protein, the amountof active heterologous protein, and/or the total amount of heterologousprotein.
 52. An array of expression systems for testing expression of atleast one heterologous protein, said array comprising: at least 10nonidentical test expression systems in separate addressable locations,said at least 10 nonidentical test expression systems each comprising adifferent combination of a) a Pseudomonad or E. coli host cellpopulation, and b) at least one expression vector encoding at least oneheterologous protein, wherein the array includes at least 5 differenthost cell populations and at least 2 different expression vectors, andfurther wherein at least 3 of said at least 5 different host cellpopulations are deficient in their expression of at least one protease;and wherein at least one of the nonidentical test expression systemsoverexpresses the heterologous protein.
 53. The method of claim 8,wherein the yield of the heterologous protein is a measure of the amountof soluble heterologous protein, the amount of recoverable heterologousprotein, the amount of properly processed heterologous protein, theamount of properly folded heterologous protein, the amount of activeheterologous protein, and/or the total amount of heterologous protein.54. The method of claim 49, wherein the yield of the heterologousprotein is a measure of the amount of soluble heterologous protein, theamount of recoverable heterologous protein, the amount of properlyprocessed heterologous protein, the amount of properly foldedheterologous protein, the amount of active heterologous protein, and/orthe total amount of heterologous protein.
 55. A method of assembling anarray of expression systems for testing expression of at least oneheterologous protein, said method comprising: placing in separateaddressable locations at least 10 nonidentical test expression systems,said at least 10 nonidentical test expression systems each comprising adifferent combination of a) a Pseudomonad or E. coli host cellpopulation, and b) at least one expression vector encoding the at leastone heterologous protein, wherein the array includes at least 4different host cell populations and at least 2 different expressionvectors, and further wherein at least 2 of said at least 4 differenthost cell populations are deficient in their expression of at least oneprotease; and wherein at least one of the nonidentical test expressionsystems overexpresses the at least one heterologous protein.